Split a database

David32 · December 6, 2019, 11:05am

Hello everyone

I have a database of molecules (500, lines SMILES) and I would like to bring out these molecules in file .sdf one by one? (1 molecule = 1 sdf file)
I try with the song Partitionning, but it proposes me percentages and it is not by group that I wish to leave the molecules.

If anyone would have an idea?

ipazin · December 6, 2019, 1:27pm

Hi there @David32,

I would say you need a loop where in each iteration you will take one row (1 molecule) and save it as a .sdf file. You can check Chunk Loop Start node and this example where you can see how to write multiple files using loop and flow variables

Why second topic with same question? Will close it if you don’t mind.

Br,
Ivan

David32 · December 6, 2019, 1:47pm

Hi ipazin

I have several .csv files so that I can display the SMILES then export in separate .sdf.

I do not see how to apply a KNIME workflow molecule by molecule so that it reads molecules by molecules

ipazin · December 6, 2019, 1:53pm

Hi @David32,

maybe you want to add more information. What is your input, what do you need to perform (some joining or converting) in KNIME workflow and what is your expected output… Adding examples help a lot.

I’m not so much into molecules so maybe someone will figure out what you need with information you presented.

Br,
Ivan

David32 · December 6, 2019, 2:04pm

I have 1 line in a .csv file

I need to convert it to SMILES via the Open Babel node.

Once converted to SMILES I have to export it to .sdf file. I would like to do it for 500 .csv file

My workflow works and looks like this:

File Reader -> Open Babel -> Columf Filter -> SDF Writter

I would like to add a node or write a script for KNIME to make me 500 in a row.

armingrudd · December 6, 2019, 2:30pm

Hi @David32,

Use the List Files node to list all the CSV files. Then the Table Row To Variable Loop Start node. After that you can use the nodes you already mentioned (use the “Location” variable for the File Reader node) and put the Variable Loop End node at the end.

David32 · December 6, 2019, 3:00pm

My workflow looks like this: List of files (directory where are csv files) -> Table Row to Variable Loop Start -> File Reader -> Open Babel -> Colums Filter -> SDF Writer.

I do not see where to place Variable Loop End.

In File Reader “Use Variable: Location” does not make me walk this node.

I need to read 500 .csv files, pass them through the Openbabel node and then output 500 .sdf files

David32 · December 6, 2019, 3:24pm

Hi, armingrudd

It did not work. Would there be a parameter that I missed or would need more details

armingrudd · December 6, 2019, 3:44pm

Here is an example where I have converted a few CSV files into XLSX:

list_files.knwf (134.6 KB)

David32 · December 6, 2019, 4:10pm

Thank you for your example armingrudd, but I need a node that shows what’s in .csv files (new column) before converting them to an .xls file

David32 · December 6, 2019, 4:37pm

Here is my workflow that works for one molecule, with the .csv file attached inside (to be moved (bcl_act.csv), so that there are no errors)

Bcl_Flow_csv_sdf.zip (51.9 KB)

armingrudd · December 6, 2019, 5:01pm

Sorry, use the “URL” Variable instead. I am confused now. I think I set the “Location” as the variable in my example before updating to KNIME 4.1 and now after updating it works with URL.

However, if you use the URL values, the File Reader will work fine.

David32 · December 6, 2019, 5:23pm

Can you explain node by node. In which node should I use the option of URL varibable

armingrudd · December 6, 2019, 5:27pm

The File Reader node.

David32 · December 6, 2019, 5:48pm

It helped me a lot, but I get 6 times the same file with 500 lines.

armingrudd · December 6, 2019, 7:13pm

Do you have 500 csv file? or 1 csv file with 500 lines?

If you have 500 csv files you have to follow my solution. If you have 1 csv file with 500 rows and want to write each line in a new file then follow the solution by @ipazin.

If you have several csv files and each of them has several rows and you want each line in each csv file to be written separately, then use the Chunk Loop inside the Table Row To Variable loop.

Pay attention to the file names created by flow variables.

David32 · December 7, 2019, 7:25am

Hello

I have 1 .csv file of 500 molecules that I can cut and 500 .csv files of 1 molecules thanks to R.

armingrudd · December 7, 2019, 7:32am

You do not need to split the CSV file. Just follow the solution by @ipazin and use the Chunk Loop Start node after the File Reader node. At the end of your workflow, after the SDF Writer node, use the Variable Loop End node to close the loop (check my workflow example). And That’s it!

David32 · December 7, 2019, 7:51am

Sorry, but how with you did to put the link (in red) between Excel Writer and Variable Loop End

?

David32 · December 7, 2019, 7:54am

I just managed to put the link