Picking Diverse rows

I know about the RDKit diversity picker but what about picking / sampling diverse rows from a defined set of columns in a table? I see all the distance based nodes but not a node specific for this purpose.

There are no single alternative nodes to the RDKit Diversity Picker, unless you count those offered by LigandScout or perhaps Schrodinger which are not free.

What columns are in your “defined set of columns” ?

There’s more than one way to approach this and you’ll have to get creative.

For example, if the columns contain fingerprints, then you can calculate a distance matrix (using Tanimoto distance, as one option) , then do hierachircal clustering, then assign compounds to x clusters (where x is the size of your desired diversity set), then choose 1 compound per cluster using, say, the Partitioning node.

If the columns contain property counts, then you can use the Hierarchical Clustering node to generate the distance matrix, then the Hierarchical Clustering (local) node to generate clusters you need, then choose 1 compound per cluster. Alternately, you could use the k-means node to perform k-means clustering then choose 1 compound per cluster.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.