I have found that the CDK fingerprint similarity node works really well and fast even with large set of molecules (without taking almost any of your computer memory!).
It would be great if there is an option where the first and second input table is the same table but the node understands that you are not interested in comparing each molecule against itself!
Right now the maximum tanimoto obtained is always 1 as it finds the identical molecule from the second table and therefore not useful output.
You are absolutely right. Sometimes it is better not to fish out the same molecule in the reference table. I will add an option to exclude those 'self-hits' from the result.
The option will appear in the next version of KNIME-CDK.
Following up on my previous post, I have added an option for "all against all" queries, where input and reference table are identical. Screenshot below.
The option will become available with the next version update.