Row splitting tables into several tables at once

peleitor · March 14, 2019, 6:02pm

Is there any means of row splitting a table into several (not only 2) sub-tables, at once?

Eg. I have table A, and I want to split by rows into A1, A2, … An, where “n” may be known in advance or not, and n>2.

I want to do this in a single pass (instead of several nodes splitting into 2 tables), for performance reasons.

Thanks

Iris · March 17, 2019, 11:18am

It is not possible to have a node with an arbitrary number of outputs, this must be fixed. Otherwise it would also not be very helpful, because to which consequent node do you want to connect the sometimes existing tables?

I would solve this either with Looping or with Streaming!

Looping, you can do a group loop start, based on your split criterias, and process the subpart of the data in each loop iteration.

Or you generate your self a wrapped metanode, with a lot of row filter nodes. Than activate streaming on this wrapped metanode. It is also a good idea, to already read the data inside the streamed wrapped metanode.

Best wishes, Iris

peleitor · March 20, 2019, 7:04pm

Thanks Iris!

evert.homan_scilifelab.se · March 26, 2019, 7:46am

Hi,

I think a node that would allow for splitting a table in several tables (>2) would be very useful, in order to be able to process the split tables in parallel, rather than consecutive as will be the case with looping.

I have a very large table with chemical structures which I would like to 3D optimize with the RDKit 3D Optimize node. How can I split this in say 10 parallel jobs?

Best/Evert

beginner · March 26, 2019, 8:04am

Hi Evert,

most RDKit nodes are already internally multi-threaded hence no such splitting is needed.

(This splitting would be available with the parallel chunk loop start but be warned especially with lots of complex data the setup these nodes do is very time consuming and often not worth it)

To add further if you generate the conformers with RDKit you could use the “new” ETKDG method which does not require optimization afterwards.

3D is slow so only other option is to get a faster CPU if this is a regular occurrence.

evert.homan_scilifelab.se · March 26, 2019, 9:36am

Hi,

Thanks for the swift reply. How do I choose this ETKDG method? This is not an option in the RDKit Optimize Geometry node version I have.

Cheers/Evert

beginner · March 26, 2019, 11:58am

Hi Evert,

this is an option in the Add conformers node. As I mentioned it hence only applies to conformers generated with RDKit and not if they come from other sources. The option are under advanced tab in that note and they are set by default when adding the node to the workflow.

I would uncheck the UFF clean-up for speed gains but maybe you need to experiment if that works for your molecules or not.

If you have precomputed 3D structures, then this doesn’t apply to you (except that the optimizer node already us multi threaded and usually faster than commercial tools…)

evert.homan_scilifelab.se · March 26, 2019, 12:12pm

Thanks. Couple of questions:

I am using Smiles to start with. Is there an advantage to first generate 3D coordinates with the RDKit Generate Coords node, before running the Add conformers node?

I only want a single 3D conformation for each structure. Is it simply enough to set the number of conformers to 1?

Cheers/Evert

beginner · April 3, 2019, 5:50am

I guess by now you have answered your question. Yes it’s possible and simple to generate only 1 conformer. However I’m not sure how that would be useful. It will most likely not be anywhere close the minimum unless yo get very lucky.

evert.homan_scilifelab.se · April 3, 2019, 6:34am

Hi,

It would be used as input for docking with Glide (where new conformations are generated) so only a reasonable starting conformation is required (not the global minimum anyway).

Best/Evert

denisfi · March 2, 2023, 11:13am

Hi guys, I made a simple solution with loops and files path to make it.

For the example, I used the Chunck Loop start, because it can split it in fixed numbers of rows or parts/groups, make it easier to manipulate as you wish.

Inside the loop session, I just build a string for the path with a counter, manipulating the information by the Interactive end loop session. and before the end loop node, I insert a write csv node to export the data for a file.

file_split.knwf (363.8 KB)

For the loop session, I’d like to break a part with 1.000 rows of data for a file.

You can say that you’d like to break on 4 parts/chucks as you wish.

Inside the loop, I’ll use variables (“currentInteraction” and “maxInteraction”) to manipulate the file name result and control de end of the loop.

Expression: join($${Sknime.workspace}$$,$${Sfolder}$$,$${Ssplitfile}$$,string($${IcurrentIteration}$$),“.csv”)

I need to save the new file to another variable, I called it as “splitfile”, then I create a new path for it too.

With the CSV Writer node, I’ll set the file path with the workflow variables, using the splitfile path indicator.

And for the ends, I used the Variables End loop node to check the MaxInteraction that used for this situation.

The result was this:

Can it solve your problem?

Tks,

Denis