RDKit Diversity Picker


In the node description, it mentions that as the number of cpds to be picked increases, the process rapidly slows down. I am finding this in wanting to pick ~4000 cpds.

What exactly is the complexity of the algorithm, is it along the lines of np2. Where n is total dataset, and p is number of cpds to be picked ?



Hi Simon,

The complexity is, as you guessed, N * M^2, with N the dataset size and M the number of compounds to be picked. 


p.s. I changed notation to N and M instead of n and p because "np" has its own meaning in complexity analysis.

Thanks Greg,

That makes sense, in the end my 4000 pick ran to completion, but took around 48 hrs, and the results are good :-)

I am but a mere novice with complexity analysis, so thanks for the heads up on nomenclature.