We've notice some unexpected behaviour on the RDKit Molecule Fragmenter node. When running the node multiple times we are getting the same results however if we shuffle the the order we get 'different' fragments produced when assessing the Fragment SMILES column.
It appears this is due to different SMILES strings being produced depending on the sort and this is what was being used to compare.
Is it possible to get a canonical fragment identifier? A colleague was attempting to identify novel fragments between two datasets. I tried the RDKit Canon SMILES node using the Fragment as an input which significantly reduces the novel fragments depending on sort but does not completely resolve the issue.
Looking at the python documentation the MolFragmentToSmiles(...) method does have a parameter for canonicalisation.
thanks for reporting this issue. Sorry for the late answer ... vacation time of the year.
Would you be able to attach a simple workflow that demonstrates this problem to understand it better? Best would be to have the positive and the negative case in one workflow. Thanks for your help here. I hope to find some time in the next 2 months to look in to it and to provide a fix, if possible.