Suggested modification to Fingerprint Similarity node

Hi,

I find the Fingerprint Similarity node very useful, but wonder whether it would be possible to add an option (via checkbox?) that could add a column (Collection type) containing a list of the Row IDs of the reference fingerprint(s) that have matched.

So, for example, if a reference list of 4 fingerprints is being used, and the aggregation setting is set to 'maximum', then as well as returning a 'similarity' column that shows the maximum similarity, a second column would contain a collection of the Row IDs for the reference(s) that gave maximum similarity (the collection type allows for if multiple reference fingerprints tie for maximum similarity).

The handling for the minimum agregation could behave similarly (ie return list of rowIDs for least similar reference(s)); and for consistency maybe the 'average' aggregation method would just return a the full row ID collection?

 

This would help if one wants to know which of the reference compounds each 'test' compound is most or least similar to; rather than just knowing the similarity but not which reference led to it.

 

Kind regards

James

Hi James,

I believe CDK's version of this node does exactly as you describe.

Simon.

Hi Simon,

You're quite right - thanks!  I'm not sure what happens in the case of similarity ties, but I'm happy to take that unknown at this point!  : )

Incidentally, it looks like this addition of a 'Reference' column in the case of Maximum or Minimum aggregation method is undocumented in the node help.

Kind regards

James