I am trying to calculate Mordred descriptors within KNIME Analytics Platform. The workflow activates a conda environment with RDKit package that connects to a Python script node for Mordred descriptor calculation. All descriptors have missing or NaN values.
Mordred descriptors: GitHub - mordred-descriptor/mordred: a molecular descriptor calculator
My workflow is attached.
mordred_error.knwf (66.7 KB)
Thanks for providing an example workflow. The problem is that some of those descriptors return a missing value or error. When converting that to a pandas DataFrame, you end up with some columns of a particular data type (mordred.error or mordred.missing). When you pass the pandas DataFrame to the output port, KNIME fails at interpreting those data types correctly. The only option is to remove those columns from your pandas DataFrame before passing that to the output port.
Here is the Python code that works for me:
from rdkit import Chem
from mordred import Calculator, descriptors
import pandas as pd
df = pd.DataFrame(data=input_table_1)
calc = Calculator(descriptors, ignore_3D=True)
mols = [Chem.MolFromSmiles(i) for i in input_table_1['Canonical SMILES'].values]
desc_df = calc.pandas(mols)
# remove columns of type object (= mordred objects)
df_num = desc_df.select_dtypes(exclude=['object'])
df = df.reset_index(drop=True)
output_table_1 = pd.concat([df, df_num], axis=1)
I hope that helps. Let me know if you have further questions.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.