Error reading in smiles files C#N breaks smiles string

Hi Everyone,

I’ve searched google and KNIME forum with no luck. Hoping someone can help!

I don’t know how I’ve been working so long and not come across this but I am losing a bunch of molecules when reading in a flat file which contains smiles strings. Here is an example from the MUV dataset

1255527 MUV_692_A_12 N#CCc1ccc(NC(=O)Nc2cccc(Cl)c2)cc1

What happens when read in with file reader is the smiles imports as

1255527 MUV_692_A_12 N

What is effectively is happening is the rest of the smiles sting is getting commented out because of the hash. Any ideas how to import smiles with native KNIME nodes without this error?

Thanks in advance,
Jason

Update on this… When I convert the file to the following format reordering the columns and changing from tab to space the issue seems to not happen. Is there something special about tab spacing causing this?

N#CCC1=CC=C(NC(=O)NC2=CC=CC(Cl)=C2)C=C1 1255527 MUV_692_A_12

If I also rearrange the order but keep the space spacing

1255527 N#CCc1ccc(NC(=O)Nc2cccc(Cl)c2)cc1 MUV_692_A_12

It’s not lost either. There must be something special about tab spacing causing this. Can anyone clue me in?

Thanks in advance,
Jason

In the File Reader node, I have some success when I change the character that’s being used for single line comments:

2 Likes

@elsamuel Thanks so much! This is something I had never paid attention to. For some reason when space separated that single line is not populated with a # but when it reads a tab delimited file it is populated.

Must be some standard convention that I’m not aware of.

Thanks for bringing the reason to my attention!
Jason

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.