I've prepared a small, but ready to expand, workflow, aiming at standardizing chemical structures. It uses some Indigo nodes with molecules transformation using tautomer SMIRKS reactions. I'm not good at SMIRKS, but probably some of you are, and are interested to expand the features list of this "standardizer".
What workflows does:
converts some tautomers
remove salts, converts isotopes (with standard indigo node)
aromatizes the structure (with standard indigo node)
What workflows lacks:
Neutralize structures (COO- -> COOH; RNH3+ -> RNH2 etc)
unify more tautomers
If you are interested in expanding the capabilities of that workflow, or have another idea to handle the problem, feel free to post it here.
Nice initiative, if you are an academic you get get a free chemaxon licence. However the nodes are provided by Infocom and these still come at a cost.
It looks like a good start, I personally would want to have a report of what had been done to each structure. For modelling purposes you may want to remove some mixtures or molecules containing R and X atoms. But the process will be purpose specific.
For this workflow I would maybe replace the feature remove action for removing mino components with the Component seperator node.
swebb, thanks for the discussion. The advantage of such KNIME workflow would be the fact, that it can be easily tuned to specific needs (via nodes configuration).
Meanwhile I'm attaching the second version of this simple workflows, with added some transformations (tautomers and neutralization). As I mentioned above, it is only pre-pre-pre relase of something we could name (hopefuly in the future) "KNIME structure standardizer".
I wasn't criticising :), an open source curation tool in knime is great!
I tried to see if I could add in some reporting (create a new column with what happenend) but it doesn't look like this would be simple. For example have a new column which reports if a mixture has been split.
Here is a new version of "KNIME standardizer" workflow with minor improvements. Few new tautomers unification reactions was added and some bug was fixed (and probably some other introduced;). I've also introduced the paralell nodes processing.
The benchmark: on intel core i3 with windows 7, the throughput is about 70-180 compounds/s.
If you have new ideas how to improve the workflow, and - especially - new tautomers generalization rules, feel free to post it here.