Clustering by a numerical value

Hi all,

I would like to create a clustering solution that takes experimentally measured 19F NMR shifts, and assigns molecules to clusters (10 molecules per cluster) based on each of those clusters having chemical shifts as dissimilar as possible (so as to avoid signal overlap when pooled).

Can anyone give any pointers as how to best achieve this?



Hi Alastair,

If I understood the question correctly you want to cluster based on the 19F shifts into 10 clusters/pools. I'll call these 'pools' from now on, to distinguish from the 'clusters' that I think you want. Then take one compound from each pool to generate the list of 10 compounds that are dissimilar, that would define your 'cluster'.

Hope that helps.


Hi Jon,

I was actually not worried about chemical clustering - rather creating compound pools that have NMR shifts that don't overlap.  I bodged a solution, but wonder if there is a more elegant way.  Somehow I imagine there is a statistical method to group a set of N objects into X number of pools in such a way that the variance/SD of the pools is maximised.

I achieved my solution by simply sorting the list by measured 19F shift, and then taking chunks of ten and assigning Row0 --> Pool O, Row 1 --> Pool 1 etc......and as the list was sorted by chemical shift, the next molecule in Pool 0 from the second chunk was therefore quite different from the previous one.