Row splitting tables into several tables at once

Is there any means of row splitting a table into several (not only 2) sub-tables, at once?

Eg. I have table A, and I want to split by rows into A1, A2, … An, where “n” may be known in advance or not, and n>2.

I want to do this in a single pass (instead of several nodes splitting into 2 tables), for performance reasons.

Thanks

Hi @peleitor

It is not possible to have a node with an arbitrary number of outputs, this must be fixed. Otherwise it would also not be very helpful, because to which consequent node do you want to connect the sometimes existing tables? :slight_smile:

I would solve this either with Looping or with Streaming!

Looping, you can do a group loop start, based on your split criterias, and process the subpart of the data in each loop iteration.

Or you generate your self a wrapped metanode, with a lot of row filter nodes. Than activate streaming on this wrapped metanode. It is also a good idea, to already read the data inside the streamed wrapped metanode.

Best wishes, Iris

3 Likes

Thanks Iris!

1 Like

Hi,

I think a node that would allow for splitting a table in several tables (>2) would be very useful, in order to be able to process the split tables in parallel, rather than consecutive as will be the case with looping.

I have a very large table with chemical structures which I would like to 3D optimize with the RDKit 3D Optimize node. How can I split this in say 10 parallel jobs?

Best/Evert

Hi Evert,

most RDKit nodes are already internally multi-threaded hence no such splitting is needed.

(This splitting would be available with the parallel chunk loop start but be warned especially with lots of complex data the setup these nodes do is very time consuming and often not worth it)

To add further if you generate the conformers with RDKit you could use the “new” ETKDG method which does not require optimization afterwards.

3D is slow so only other option is to get a faster CPU if this is a regular occurrence.

Hi,

Thanks for the swift reply. How do I choose this ETKDG method? This is not an option in the RDKit Optimize Geometry node version I have.

Cheers/Evert

Hi Evert,

this is an option in the Add conformers node. As I mentioned it hence only applies to conformers generated with RDKit and not if they come from other sources. The option are under advanced tab in that note and they are set by default when adding the node to the workflow.

image

I would uncheck the UFF clean-up for speed gains but maybe you need to experiment if that works for your molecules or not.

If you have precomputed 3D structures, then this doesn’t apply to you (except that the optimizer node already us multi threaded and usually faster than commercial tools…)

Thanks. Couple of questions:

I am using Smiles to start with. Is there an advantage to first generate 3D coordinates with the RDKit Generate Coords node, before running the Add conformers node?

I only want a single 3D conformation for each structure. Is it simply enough to set the number of conformers to 1?

Cheers/Evert

I guess by now you have answered your question. Yes it’s possible and simple to generate only 1 conformer. However I’m not sure how that would be useful. It will most likely not be anywhere close the minimum unless yo get very lucky.

Hi,

It would be used as input for docking with Glide (where new conformations are generated) so only a reasonable starting conformation is required (not the global minimum anyway).

Best/Evert