sdf mol_file type to rdkit mol type

Hello,

I am new to the KNIME space (just installed on Friday) . I have a sdf file (see sample below) and I try to load into postgres database via rdkit cartridge. I tried :slight_smile:

    1. Read in note check “Extract Mol block” this will convert sdf file mol field to mol or mol2. Since I only has “mol” type in postgres so I select the ‘mol block’ as ‘mol’ type on KNIME side and create a column in type ‘mol’ on database side. It does not work.
    1. Read in sdf file as type (Molfile) and set up ‘text’ type in postgres and it doesnot work either.

Can you give me some advices?

Thanks,
---------------sample sdf input-----------------
ACD/Labs05011920022D

4 3 0 0 0 0 0 0 0 0 1 V2000

1.9950    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

1.3292   -1.1518    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

1.9950   -2.3037    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0

0.0000   -1.1518    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0

1 2 1 0 0 0 0

2 3 2 0 0 0 0

2 4 1 0 0 0 0

M END

<DSSTox_Compound_id>

DTXCID505

<DSSTox_Substance_id>

DTXSID7020005

Hi Sue,

That’s a good question that I can, unfortunately, only start to answer.
The attached workflow shows how to read a file containing SMILES, convert those to RDKit molecules, and then add the SMILES and SDF as strings to a postgresql database.

What I don’t know how to do is to get KNIME to execute the appropriate queries to create a table with rdkit molecules based on that SDF, so I have to do those from the psql prompt:
knime_demo=# drop table if exists mols;
DROP TABLE
knime_demo=# select molregno,mol_from_ctab(sdf::cstring) into mols from raw_data;
SELECT 26

Once you’ve done that you can then query the mols table using the standard rdkit substructure/similiarty searches.

I will check with my colleagues to see if there’s a way to do this entirely inside of KNIME and update this reply if I find anything.

Best,
-greg
rdkit_cartridge.knwf (29.7 KB)

1 Like

An update to that previous answer. With help from a colleague (thanks @tobias.koetter!) I managed to extend the workflow to actually create a table with RDKit molecules in the database.
I also added a few nodes showing how to do a substructure query and retrieve the results from the database as SDF

-greg
rdkit_cartridge.knwf (51.3 KB)

3 Likes

Greg,

Thank you very much. I was able to run your work flow and have some good ideas for my workflow.

1 Like

Not a direct answer to your question but related. Since there seems to be some interest in this I just made 2 components public on the KNIME Hub.

  1. Inserting RDKit Molecule into PostgreSQL

You can insert KNIME RDKit Molecules directly in an existing PostgreSQL table with a mol column (+ any other columns). You can simply create the table outside of KNIME. The assumption here being this is for “permanent storage” eg. the table is here to store molecules for a specific application long-term hence creating the table is a one-time effort.

The component internally uses a Python Script and hence it requires that Python is setup correctly inside KNIME and that said environment has rdkit and psycopg2 installed (the later is the python module for working with PostgreSQL). On top of this, due to not using KNIME DB Nodes and hence not having Credentials available, the DB Password needs to be entered into a String configuration in plain text. So one should take great care not to share workflows containing this Component without prior removing the password.

While the DB code for insertion is optimized and should work well for large amounts of data, due to above requirements/limitations the handling of this component is non-trivial.

  1. Reading RDKit Molecules from PostgreSQL

This works much better and one can use the KNIME DB Nodes to select a RDKit molecule like this:

SELECT id, mol_to_pkl(mol) as "Molecule"
FROM test_structure

The important part being mol_to_pkl(mol).

This will return the RDKit molecule into a KNIME Binary Object. The binary object can then be converted directly to a RDKit molecule with the linked component.

This prevents form converting the molecule back and forth to mol/smiles and is very fast even for tens or hundreds of thousand of molecules.

2 Likes

What I forgot to write above is that another option to use KNIME nodes is similar to suggested solution but instead of having a “raw_table” for loading one can just add a text column to which you insert either smiles or molfile (if orientation is important obviously molfile). Then add a trigger to the table that converts the text data into RDKit molecule into the mol column.

Here the advantage is, that this works for any other connecting applications as no chemical/rdkit knowledge is needed.

  1. Generate Trigger Function (adjust accordingly)

    create function generate_rdkit_mol() returns trigger
    language plpgsql
    as
    $$
    BEGIN
    new.rdkit_mol := public.mol_from_smiles(new.smiles::cstring);
    RETURN NEW;
    END;
    $$;

  2. Generate Trigger on table

    create trigger trg_generate_rdmol
    before insert or update
    on test_structure
    for each row
    when (new.smiles IS NOT NULL)
    execute procedure generate_rdkit_mol();

2 Likes

WHat is the molfile data type in postgres or how to convert molfile to string so I can save it as text in postgres? I try to load molfile and knime give a type error.

If you are replying to my post about triggers, then it’s a bog standard text column. Eg. your table then needs a text column and mol column (rdkit molecule column). You insert into the text column from knime and the trigger generates the rdkit molecule in the database. Obviously for molfile you need to adjust above trigger as the example uses smiles.

1 Like

@kienerj when you use sdf reader node to read in a sdf file the molfile block is multiple line string (type : molfile) when I try to use DB loader load to postgres (cloumn type txt) . I got an error:


How do you handle it?

Thank you for the input. I have a large file so I may run a bash job after load all moifile instead trigger. It is nice to learn how to add trigger to db.

Ah, yes that is another issue. you need to convert the column containing the molfile into a string column. one option is to use string manipulation with string() function.

2 Likes

I use a java-snippet that returns any molecule format as string:
it’s simply one line:

return $YOUR_MOLECULE_COLUMN$;

Select String as output. Replace existing or create a new column.

1 Like

@kienerj and @docminus2 thank you all. The problem has been resolved.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.