RDF file reader/fixer/converter (replace or add-on to Erlwood Reactions File Reader)

For those who might deal with chemical reaction files and require a reader node, the Erlwood extensions offers a suitable one, though with at least two (for me) major drawbacks.
I addressed the following via a Python script:

  • Erlwood node doesn’t exist (yet) for Knime V4.3.x
  • It bugs out if you have RDF files with missing structures.
    The latter can happen e.g. for rdf exports from Scifinder or Reaxys.

See here on the KnimeHub:

This mini-workflow reads any number of RDF files, checks for the missing portions and fixes it (by simply elimination that record). In addition, a csv file is created with the structure converted to SMILES.
The main/important portion lies within the single Python node.

If you have worked with Reaxys or Scifinder RDF imports you will know that the number of resulting columns will differ. The same goes for the resulting csv files, even for the number of structure columns.

The Python script uses as little imports as possible, though it does require RDKIT in your Python installation. It is independent of the Knime Versions 4.x. Not tested in earlier 3.x Versions.

There is certainly room for improvement in the parsing or output, but it goes a long way.
Hope it helps the one or other person as well.

7 Likes

Hallo @docminus2,

Thanks for sharing this with the KNIME community. Definitely very helpful!

Cheers,
Janina

I have updated the script (and the description).
Major change is the inclusion of minimal sanitization of molecules else the module crashes in case of faulty molecules smiles.
(I would update my original post, but it seems it isn’t possible anymore)

Hello - I have updated this workflow today - it contains a prefilter that will check for existing “_fixed” files and removes them so you don’t end up with “_fixed_fixed” etc should you run it multiple times on the same folder.
There is now also support for RDF files stemming from ICSynth.

1 Like

And another update: some ambiguity in the python code description had to be fixed, plus some additional stuff, as well as support for Infochem Spresi RDF files.

1 Like