Zinc database

jen1 · May 1, 2020, 6:41pm

I have downloaded the ZINC database, and loaded it to a workflow
with the File reader node (as a smiles file). Now the rows contain both the
ZINC ID and the Smiles, and I want to separate into 2 columns. I tried doing
this in the File reader node and just got an error. Then I tried the cell
splitter node and it worked on a small set of row (29,000), however, on the
full set I get an emtpy table with 15 empty columns. It also reports the
right number of rows, but there is nothing there. The Zinc library is 1
billion rows, and I am wondering if it is too large. I was wondering if anybody knew how to fix this.

pigreco · May 2, 2020, 10:46am

Probably there is a string that contains a lot of times your column separator.

jen1 · May 2, 2020, 11:19am

ah ok, thank you. Would you happen to know why that output is totally empty? It does not show any of the rows. If I run it on a small subset like 29000 rows, all the rows appear, but on the full data set it is just empty.

pigreco · May 2, 2020, 1:05pm

Do you have any error in console or in Knime.oog after workflow runs?

beginner · May 4, 2020, 5:06am

Usually smiles files are “space” separated. Eg the smiles then a space and then the “molecule name” which in your case is the zinc id. Therefore you should easily be able to do the separation in the file reader.

jen1 · May 4, 2020, 12:22pm

yes the adding a space in the file reader node was the first thing I tried and I just got a error and the node failed. it worked when i removed any separator so then i tried the cell splitter.

jen1 · May 4, 2020, 2:00pm

there were no errors in the console. it was just green but the table was empty and stated the correct number of rows.

ipazin · May 5, 2020, 3:58pm

Hi there @jen1,

Can you share printscreen of output? Also if you filter top 1000 rows do you see any data?

Br,
Ivan

system · November 4, 2020, 4:05am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.