I am new to the KNIME space (just installed on Friday) . I have a sdf file (see sample below) and I try to load into postgres database via rdkit cartridge. I tried
Read in note check “Extract Mol block” this will convert sdf file mol field to mol or mol2. Since I only has “mol” type in postgres so I select the ‘mol block’ as ‘mol’ type on KNIME side and create a column in type ‘mol’ on database side. It does not work.
Read in sdf file as type (Molfile) and set up ‘text’ type in postgres and it doesnot work either.
That’s a good question that I can, unfortunately, only start to answer.
The attached workflow shows how to read a file containing SMILES, convert those to RDKit molecules, and then add the SMILES and SDF as strings to a postgresql database.
What I don’t know how to do is to get KNIME to execute the appropriate queries to create a table with rdkit molecules based on that SDF, so I have to do those from the psql prompt:
knime_demo=# drop table if exists mols;
DROP TABLE
knime_demo=# select molregno,mol_from_ctab(sdf::cstring) into mols from raw_data;
SELECT 26
Once you’ve done that you can then query the mols table using the standard rdkit substructure/similiarty searches.
I will check with my colleagues to see if there’s a way to do this entirely inside of KNIME and update this reply if I find anything.
An update to that previous answer. With help from a colleague (thanks @tobias.koetter!) I managed to extend the workflow to actually create a table with RDKit molecules in the database.
I also added a few nodes showing how to do a substructure query and retrieve the results from the database as SDF
You can insert KNIME RDKit Molecules directly in an existing PostgreSQL table with a mol column (+ any other columns). You can simply create the table outside of KNIME. The assumption here being this is for “permanent storage” eg. the table is here to store molecules for a specific application long-term hence creating the table is a one-time effort.
The component internally uses a Python Script and hence it requires that Python is setup correctly inside KNIME and that said environment has rdkit and psycopg2 installed (the later is the python module for working with PostgreSQL). On top of this, due to not using KNIME DB Nodes and hence not having Credentials available, the DB Password needs to be entered into a String configuration in plain text. So one should take great care not to share workflows containing this Component without prior removing the password.
While the DB code for insertion is optimized and should work well for large amounts of data, due to above requirements/limitations the handling of this component is non-trivial.
This works much better and one can use the KNIME DB Nodes to select a RDKit molecule like this:
SELECT id, mol_to_pkl(mol) as "Molecule"
FROM test_structure
The important part being mol_to_pkl(mol).
This will return the RDKit molecule into a KNIME Binary Object. The binary object can then be converted directly to a RDKit molecule with the linked component.
This prevents form converting the molecule back and forth to mol/smiles and is very fast even for tens or hundreds of thousand of molecules.
What I forgot to write above is that another option to use KNIME nodes is similar to suggested solution but instead of having a “raw_table” for loading one can just add a text column to which you insert either smiles or molfile (if orientation is important obviously molfile). Then add a trigger to the table that converts the text data into RDKit molecule into the mol column.
Here the advantage is, that this works for any other connecting applications as no chemical/rdkit knowledge is needed.
Generate Trigger Function (adjust accordingly)
create function generate_rdkit_mol() returns trigger
language plpgsql
as
$$
BEGIN
new.rdkit_mol := public.mol_from_smiles(new.smiles::cstring);
RETURN NEW;
END;
$$;
Generate Trigger on table
create trigger trg_generate_rdmol
before insert or update
on test_structure
for each row
when (new.smiles IS NOT NULL)
execute procedure generate_rdkit_mol();
WHat is the molfile data type in postgres or how to convert molfile to string so I can save it as text in postgres? I try to load molfile and knime give a type error.
If you are replying to my post about triggers, then it’s a bog standard text column. Eg. your table then needs a text column and mol column (rdkit molecule column). You insert into the text column from knime and the trigger generates the rdkit molecule in the database. Obviously for molfile you need to adjust above trigger as the example uses smiles.
@kienerj when you use sdf reader node to read in a sdf file the molfile block is multiple line string (type : molfile) when I try to use DB loader load to postgres (cloumn type txt) . I got an error:
Thank you for the input. I have a large file so I may run a bash job after load all moifile instead trigger. It is nice to learn how to add trigger to db.
Ah, yes that is another issue. you need to convert the column containing the molfile into a string column. one option is to use string manipulation with string() function.