Diversity selection

Hi All,

Is there a way in knime to do diversity selection using eucliden distance or a using any distance matrix ? Is there any particular node or some long way to do the diversity selection?

Thank you




input sdf, calculate properties or fingerprints, convert fingerprints to  binary if necessary , use cluster create and apply nodes and then a sampling node and u are done! Just develop a workflow! It will give you much more control over the logic | U can even use dendrograms for visialization.

else use rdkit diversity picker node after converting from mol_>rdkit mol.

Thanks InsilicoConsulting.

I have already done the workflow here, but i dont want to use fingerprint on my structure, i am rather focusing on shape/geometrry of them. I have few properties which does that. Here are the steps of the workflow.

1) Read an SDF file

2) Generated best conformers(min Energy ) and calculate some geometry Descriptors.

3) Do PCA on the descriptor and try to capture 80% of variance.

4) used thies PCA component for K-means Clustering, It will generate the Cluster.

5) Finally i am using the sampling node here to get 1-2 sample from each cluster.

This is ok, but i want to find diverse shape in then rather than sample. Do Knime has Maxmin Dissimilarrity node? Thanks.


If you  already have the shape related descriptors as double value[s] thenwhy not use a  loop and calculate the euclidean distance or soemthing else. Use the math express node inside the loop .

Also see if the distance matrix node helps.

Hi Kuldeep,

I am very interesting in your workflow. Can you give more information on the "geometry descriptors" that you use?



Lionel, check out http://users.abo.fi/mivainio/shaep/index.php