SDF Reader node - improved handling of problem records?

Please note - original post incorrectly stated "missing the terminal $$$$ and newline" rather than "missing the terminal 'M  END' and newline - corrected now - apologies for any confusion!

Hi,

For some weeks now, I have had an issue with a single record in the downloadable SDFs from an external site (DrugBank) causing the SDF Reader node to throw an error.  This has meant that I had to use tools outside of KNIME to interrogate the SDF to find the problem record, and remove it manually, and save a local copy for my workflow to refer to.  Of course this means that the input to the workflow soon becomes out-of-date, rather than always referring to the latest SDF on the web server!

The problem was due to a record's structure block being truncated (DB00638, inulin) - ie it just stopped at 65535 characters, and rolled straight into the next field - so was missing some information, but was also (more importantly) missing the terminal 'M  END' and newline - see bold problem line in molblock:

 

 

638
Mrv0541 09201117152D
 
801838 0 0 1 0 999 V2000
8.4235 0.1973 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
8.6343 -1.5827 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
8.3845 1.9893 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
8.6732 -3.3748 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
...etc...
...etc...
...etc...
211239 1 0 0 0 0
211274 1 0 0 0 0
212240 1 0 0 0 0
212289 1 0 0 0 0
213241 1 0 0 0 0
213286 1 0 0 0 0
214242 1 0 0> <DRUGBANK_ID>
DB00638
 
> <DRUG_GROUPS>
approved; nutraceutical
 
> <GENERIC_NAME>
Inulin
...
...to end of record.

 

I have contacted the DrugBank team, but have yet to hear when there will be a fix in the data; but I wondered if it would be possible to extend the parsing capabilities of the SDF Reader node to allow processing of 'corrupt' records, and passing them to the "Broken Molecules" port?  This would be very useful for working with erroneous input (and potentially correcting it!) directly in KNIME.

 

Kind regards

James

I could not reproduce the problem but I see that it really happens on your side. I have changed the SD reader to handle such (weird) cases more gracefully. Please check if the next release fixes the problem on your side.

 

Hi Thor,

Sorry for the slow reply - I have only just re-looked at this problem; and, unfortunately, for me it is still that!  It is still the same SDF that is giving me the issues (http://www.drugbank.ca/system/downloads/current/structures/small_molecule.sdf.zip) and I see the follwing in the console:

 

ERROR DefaultSDFReader 1289

WARN  DefaultSDFReader

ERROR SDF Reader Execute failed: String value can't be null.

ERROR DefaultSDFReader

WARN  DefaultSDFReader

ERROR SDF Reader Execute failed: String value can't be null.

 

and then on subsequent attempts, just:

 

WARN  DefaultSDFReader

ERROR SDF Reader Execute failed: String value can't be null.

 

I am running 64-bit KNIME 2.5.1 on Windows 7

Kind regards

James