Inorganics Filter node

Hi Mikhail,

With the recent brilliant new nodes that the KNIME team have implemented to load in PDF files (i.e. patents or chemistry papers), and then extract structures from these, it is extremely useful in getting out final molecules, such as pharmacologically active molecules from patents. These can then be passed into the "Murcko Scaffolds" node and "Atom Replacer" node to get Carbon Skeleton frameworks which is really useful for understanding key features in a patent of pharmacologically active molecules.

The main issue however, is that you also end up extracting out inorganics such as caesium carbonate, sodium hydride, butyl lithium, sodium methoxide etc.

Is it possible to have a node which takes a set of molecules and then exports them to two ports, the top port contains organic molecules (i.e. which contains at least C and H atoms), and the bottom port contains inorganic molecules (molecules which donot contain C and H atoms). It would also be useful to have a couple of options in the node dialog to;

1.  "Interpret organic alkali metal salts as inorganic molecules" and this is set ON as default. This would be useful to remove molecules like butyl lithium and sodium methoxide which would normally be considered organic.

2. "Interpret any molecules containing d-block elements as inorganic molecules" and this is set ON as default. This would be useful to remove catalysts such as Pd(PPh3)4 etc.

I hope you can consider this in any future nodes as I believe this facility is not represented elsewhere in KNIME.



Hi Simon,


In my standardization protocol, I needed this kind of filter. So I implement a protocol with:

  1. node "atom replacer" : replacement of the selected atoms (e.g. C,H,O,N,Cl,I,Br) by a star (*) => creation of a new structure in a new column.
  2. node "Molecular proerty" : compute the molecular formula of the new compounds. Here only the compounds that contain atoms other than star have a molecular formula.
  3. node "Row splitter" : suppression of the compounds that possess a non-null "molecular formula" property.

This protocol will work for your task. But clearly this is not an optimize way to do the job and a special node that can filter the structure according to a modular list of "good" atoms (given by the user as it can be done in the node "atom replacer") would be very usefull.




Many thanks Lionel,

Thats a nice workaround to do the job, cheers!

A nice easy node to do it the future would be good though Mikhail as it will be quite a common task to undertake :-)