Zinc database

I have downloaded the ZINC database, and loaded it to a workflow
with the File reader node (as a smiles file). Now the rows contain both the
ZINC ID and the Smiles, and I want to separate into 2 columns. I tried doing
this in the File reader node and just got an error. Then I tried the cell
splitter node and it worked on a small set of row (29,000), however, on the
full set I get an emtpy table with 15 empty columns. It also reports the
right number of rows, but there is nothing there. The Zinc library is 1
billion rows, and I am wondering if it is too large. I was wondering if anybody knew how to fix this.

Probably there is a string that contains a lot of times your column separator.

ah ok, thank you. Would you happen to know why that output is totally empty? It does not show any of the rows. If I run it on a small subset like 29000 rows, all the rows appear, but on the full data set it is just empty.

Do you have any error in console or in Knime.oog after workflow runs?

Usually smiles files are “space” separated. Eg the smiles then a space and then the “molecule name” which in your case is the zinc id. Therefore you should easily be able to do the separation in the file reader.

yes the adding a space in the file reader node was the first thing I tried and I just got a error and the node failed. it worked when i removed any separator so then i tried the cell splitter.

there were no errors in the console. it was just green but the table was empty and stated the correct number of rows.

Hi there @jen1,

Can you share printscreen of output? Also if you filter top 1000 rows do you see any data?

Br,
Ivan

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.