Dear all,
We are glad to announce more details about the RDKit 4.7 Feature release compatible with the KNIME Analytics Platform 4.7.
The most notable features are:
- native read/write support for RDKit types in KNIME’s columnar backend (KNIME Columnar Table Backend Boosts Performance )
- support for RDKit types in the new Python Script nodes (which are “out of labs” since KNIME AP 4.7), including RXN reactions which were not supported with the previous Python Script nodes (now Python Script (legacy))
- support for RDKit types in pure-Python KNIME Extensions
- support for Apple Silicon added by @ptosco, @manuelschwarze and @greglandrum: you can use RDKit with the Apple Silicon version of KNIME Analytics Platform
Thanks to @steffen_KNIME for support with the implementation, @manuelschwarze and @greglandrum for reviewing, and @Alice_Krebs and @greglandrum for testing!
Below are a few examples to help you get started with RDKit and the other chemistry types in KNIME’s new Python nodes.
Note: These examples require the Python Script node and do not work with the Python Script (legacy) nodes in KNIME 4.7.
1. Converting between molecule types in Python
If you want to convert between RDKit molecules and Smiles or Smarts in a Python Script node, you can proceed as follows
import knime.scripting.io as knio
import knime.types.chemistry as ktchem
import rdkit.Chem
df = knio.input_tables[0].to_pandas()
# Convert SMILES to RDKit molecules, assuming there is a "Smiles" column
mols = [rdkit.Chem.MolFromSmiles(s) for s in df["Smiles"]]
df["RDKit Molecules"] = mols
# Convert RDKit molecules to Smarts.
# Note that SMILES, SMARTS, SDF and many other chemistry types are
# represented as strings in Python. To let KNIME know which type it should be,
# we create SmartsValues (or SmilesValue, SdfValue, ...) from the strings.
df["Smarts"] = [ktchem.SmartsValue(rdkit.Chem.MolToSmarts(m)) for m in mols]
knio.output_tables[0] = knio.Table.from_pandas(df)
The same works in the execute
method of a pure-Python node, as we see in the next example.
2. Using RDKit in Pure-Python nodes
When writing KNIME Python Extensions, you can now work with RDKit types as well. One thing you have to do in a pure-Python node is specifying the types of all output columns in the configure
method. To use RDKit or chemistry types when creating KNIME table schemas and columns,This text will be hidden simply provide the molecule types or SMILES, SMARTS or SDF values as in the example below.
import knime.extension as knext
import knime.types.chemistry as ktchem
from rdkit import Chem
def is_smiles(col):
"""
Check if the provided knext.Column contains smiles values. Due to
technical reasons, the type of the column can be either SmilesValue or
SmilesAdapterValue, so here we check for both.
The function knext.logical turns the value types into KNIME column types,
and is called "logical" as opposed to "primitive" like plain numbers.
"""
return col.ktype == knext.logical(ktchem.SmilesValue) \
or col.ktype == knext.logical(ktchem.SmilesAdapterValue)
@knext.node(
name="Smiles To RDKit Molecule",
node_type=knext.NodeType.MANIPULATOR,
icon_path="icon.png",
category="/",
)
@knext.input_table(name="Input Data", description="Input table containing SMILES")
@knext.output_table(name="Output Data", description="Table plus a 'Molecule' column")
class SmilesToRDKitMol:
"""Smiles to RDKit molecule conversion
This node converts Smiles to RDKit molecules
"""
smiles_column = knext.ColumnParameter(
label="SMILES column",
description="Input column containing SMILES data",
column_filter=is_smiles,
)
mol_column_name = knext.StringParameter(
label="Molecule column name",
description="Name of the newly created column",
default_value="Molecule",
)
def configure(self, configure_context, input_schema):
new_col = knext.Column(Chem.Mol, self.mol_column_name)
return input_schema.append(new_col)
def execute(self, exec_context, input_table):
df = input_table.to_pandas()
mols = [Chem.MolFromSmiles(s) for s in df[self.smiles_column]]
df[self.mol_column_name] = mols
return knext.Table.from_pandas(df)
Note how in the configure
method a column is created with the type Chem.Mol
. This suffices to tell KNIME that the new column contains RDKit molecules and you will be able to work with this column in KNIME with other RDKit nodes.
The input column selection parameter filters the columns such that only those containing SMILES values are available for selection. In this example, the column filter is_smiles
is provided when defining the parameter smiles_column
.
See Contents — KNIME Python API documentation and Create a New Python based KNIME Extension for details of how to write Python nodes.
Note: When working with RDKit data inside pure-Python nodes, the extension needs to have a dependency on RDKit as explained here Create a New Python based KNIME Extension, where the ID of the RDKit KNIME Integration is
org.rdkit.knime.feature
.
Note: if you want to know which types are available, you could e.g. use a Python Script node and run
import knime.api.schema print(knime.api.schema.LogicalType.supported_value_types())
Sending fingerprints from Python to KNIME
Assuming you want to compute molecule fingerprints in a Python Script node and send them back to KNIME
import knime.scripting.io as knio
import rdkit.Chem
from rdkit.Chem import rdFingerprintGenerator
df = knio.input_tables[0].to_pandas()
# Generate fingerprints for all molecules
fingerprint_generator = rdFingerprintGenerator.GetMorganGenerator(fpSize=1024)
fingerprints = [fingerprint_generator.GetFingerprint(m) for m in df["RDKit Molecule"]]
# Add fingerprint column to our DataFrame
df["Fingerprint"] = fingerprints
knio.output_tables[0] = knio.Table.from_pandas(df)
Working with KNIME-provided fingerprints in Python
If you want to work with fingerprints in Python that are coming from KNIME, they will first be represented as the KNIME type DenseBitVector for normal fingerprints and DenseByteVector for count fingerprints. You can type-cast those to RDKit fingerprints directly, but for count fingerprints you need to know the specific underlying type:
import knime.scripting.io as knio
import knime.api.schema as ks
import knime.types.builtin as ktypes
import rdkit
df = knio.input_tables[0].to_pandas()
fingerprint_type = ks.logical(rdkit.DataStructs.cDataStructs.ExplicitBitVect).to_pandas()
fingerprint_column = df["Fingerprint"].astype(fingerprint_type)
# could be UIntSparseIntVect, IntSparseIntVect, LongSparseIntVect or ULongSparseIntVect
count_fingerprint_type = ks.logical(rdkit.DataStructs.cDataStructs.UIntSparseIntVect).to_pandas()
count_fingerprint_column = df["CountFingerprint"].astype(count_fingerprint_type)
Working with RXN reactions
You can also send reactions back and forth between KNIME and Python
import knime.scripting.io as knio
import knime.types.chemistry as ktchem
from rdkit.Chem import rdChemReactions
df = knio.input_tables[0].to_pandas()
rxn_column = [rdChemReactions.ReactionFromRxnBlock(str(r)) for r in df['Rxn Reaction']]
df['rxn'] = rxn_column
df['smiles'] = [ktchem.SmilesValue(rdChemReactions.ReactionToSmiles(rxn)) for rxn in rxn_column]
df['molecule'] = [rdChemReactions.ReactionToMolecule(rxn) for rxn in rxn_column]
knio.output_tables[0] = knio.Table.from_pandas(df)
Best, Carsten