Just a quick question about the CDK nodes, is it possible to add a new node that can translate dataype string to MOL . The reason for asking is that I have a database, that contains a number of MOL structures as BLOB's, so when returned via the KNIME database reader, the column datatype is set to STRING, which then cannot be read by the CDK Nodes, so I need a converter to change datatype STRING to MOL, or is there another way to do this.
we did have such node in previous (in-house) releases but then we decided to not include it in the public release because we thought that there is no way to read Strings containing line breaks into KNIME (molecule types such as SDF, Mol2 have line breaks - so the only obvious way to read those was with the appropriate SDF/Mol2 reader, which does the typing already). Unfortunately we didn't think of the database reader...
Luckily we didn't remove the implementation from the jar files but only disabled this node. You can enable it by adding the following line to the file $KNIME$/plugins/org.knime.chem.base_1.2.1/plugin.xml:
This part comes right after the definition of the String2SmilesNodeFactory node. The entire plugin.xml should then look as follows:
<?xml version="1.0" encoding="UTF-8"?>
You find the "new" node in the repository under "Chemistry" -> "Translators". It's called "String to SMILES/SLN" (though it's possible to translate to Smiles/SLN/SDF and Mol2 - that node has not been maintained for some time but it should be working nevertheless).
Let us know if it (doesn't) work(s) for you.
PS: I will discuss with the guys here in which form we want to offer this functionality in future versions - be it in a single node such as this "String to SMILES/SLN" or in many different translator nodes).
Just tried the fix you supplied, I get the convert option, but when KNIME then parses the returned MOL file to datatype MOL, it reports a '?' in the column containing the MOL string. Subsequent nodes then report the same '?'. However, if I try a SMILES column of datatype STRING, it happily converts this to SMILES column of datatype SMI, and the structure is available for use.
I'm not sure I understand. The converter node does nothing but taking the input string and putting it into a typed output object. It does no parsing or any other validation (which basically means that you convert any string to any molecule type). I assume you also get the '?' (which means 'missing value') when you (intentionally) convert the String MOL to a Smiles, for instance? Did you check that the input does not already contain missing values, i.e. '?'.
When you say "MOL" - do you mean the Tripos' Mol2 file format (not that I know much about it but that's what it should represent)?
Would you mind sending me a sample (if legally possible)? One line would perfectly enough. You could/should use the "Table Writer" node to write out the KNIME format. You find my Email-address on my homepage: http://www.inf.uni-konstanz.de/bioml/staff/wiswedel.html (not that contains much information about me but it does have my Email address).
Sorry for not being more helpful right now.