I would like to count how many time a certain fragment, codified with a SMARTS query, appears in a molecule database. Using the CDK-KNIME SMARTS Query node, is it possible to easily distinguish and split molecules matching a certain SMARTS query and molecules that do not. Anyway it is not currently possible to count how many times the SMARTS query (or queries) matches with molecules in databases.
Do you know if it would be possible (and easy) to include such a function in the next releases of the SMARTS query node? Or also if there are alternatives to do this using CDK?
I'm not aware of any alternatives in KNIME-CDK but it was fairly straight forward to modify the existing node to accommodate that function.
You can now choose to append a column that contains the number of unique matches of a SMARTS query in the target molecule. If multiple SMARTS are provided, the sum total is calculated.
Please update your nightly build to make use of the new function. I would be grateful if you could test the node and feedback whether it works correctly and does what you need.
Thank you so much for your effort in enhancing CDK. I love this tool and I think it is getting better and better.
Currently I installed the new nightly build (1.5.3.201506052109) and I made some tests.
In general it seems that the new function is working properly, anyway I would like to discuss 2 points:
1. It would be great if in the output matching molecules table instead of (or additionally to) the new total unique count column, it would be placed a new unique count column for each of the input SMARTS. In that way the user would have a much richer information about the SMARTS fragments appearing on the molecules. Obviously I don't know if including this feature would require a major revision of the current node and hence a much greater effort, anyway in my opinion it would be good to take this into account for future versions.
2. I found a problem with a SMARTS but this is not related to the modification done on the nightly build. The same problem appears with the trusted repository version of the node. As you can see from the workflow I uploaded, if you try to match and count the ring bridge atoms in the molecules using the SMARTS "[r;!R1]" a problem appears. With the RDKit SMARTS node you obtain the count as expected while with the CDK one it seems that no molecule match that SMARTS. Anybody knows why this happens? Probably it would be good to open up a new forum thread with this problem.
thanks for testing the update. Regarding your two points:
1) That's a great idea. I have modified the node to output a collection cell. The query SMARTS order is preserved, i.e. you can match the rownames of the SMARTS table to the column headers of the output table or vice versa.
2) I could reproduce your result. I will file a bug report with the CDK.
Thanks to you for the contribution to that CDK node! I test it and it seems to work well.
Thanks also to file for me the bug report with the CDK. Just one more question: is there a way (mailing list or something) to stay aware about the bug resolution and other CDK related activities?
I see that the CDK bug tracked in this thread was solved and closed (https://sourceforge.net/p/cdk/bugs/1364/), particularly on date 2015-06-18. This is a great news! Congratulation.
Nevertheless it seems that the patch was still not applied in the current CDK-KNIME nightly build (version cdk.knime_1.5.3.201509071430).
Please, I would like to know if it would be possible to include that in the next nightly build or if some other related problem has to be fixed beforehand?