I am trying to calculate Mordred descriptors within KNIME Analytics Platform. The workflow activates a conda environment with RDKit package that connects to a Python script node for Mordred descriptor calculation. All descriptors have missing or NaN values.
Thanks for providing an example workflow. The problem is that some of those descriptors return a missing value or error. When converting that to a pandas DataFrame, you end up with some columns of a particular data type (mordred.error or mordred.missing). When you pass the pandas DataFrame to the output port, KNIME fails at interpreting those data types correctly. The only option is to remove those columns from your pandas DataFrame before passing that to the output port.
Here is the Python code that works for me:
from rdkit import Chem
from mordred import Calculator, descriptors
import pandas as pd
df = pd.DataFrame(data=input_table_1)
calc = Calculator(descriptors, ignore_3D=True)
mols = [Chem.MolFromSmiles(i) for i in input_table_1['Canonical SMILES'].values]
desc_df = calc.pandas(mols)
# remove columns of type object (= mordred objects)
df_num = desc_df.select_dtypes(exclude=['object'])
df = df.reset_index(drop=True)
output_table_1 = pd.concat([df, df_num], axis=1)
I hope that helps. Let me know if you have further questions.