I have a Knime workflow making use of the CDK fingerprint and fingerprint similarity nodes to generate Tanimoto similarity coefficients between two sets of molecules (using .sdf files from different sources).
For the most part it works really well but when enabling the “all against all” option in the node’s configuration settings (for use in comparing identical molecule sets and excluding identical matches), I often get the following error;
ERROR Fingerprint Similarity 3:1 Execute failed: Cell at Index 1 is null!
With small molecule sets (~ 1000 cpds) I have traced this error to some unknown issue in the formatting of the .sdf file - i.e. trial and error removal of small sections of the .sdf file has removed apparent problematic sections and allows the workflow to proceed as normal. However, this is not straightforward to do with larger .sdf files (~ > 100K).
I have tried many ways in Knime (chemistry-related nodes) to attempt to pre-proecess/cleaan an .sdf file (or other input file), without success.
Can anyone suggest a way to generate a consistently well formatted input file (.sdf or other) that should prevent this error from occurring when running the “all against all” option in the CDK fingerprint similarity node?