Categorical Data Type not allowed in Python Script Nodes

Hello Knimers,

I am trying to utilize LightGBM in one of Knime’s python script nodes. This requires converting the categorical columns into Panda’s categorical data type.

df[cat_cols]= df[cat_cols].astype(‘category’)

This works without issue when running a standalone python script, but when running this python script from within one of Knime’s python script nodes it fails.

So I tested a simple scenario in the “Python Source” Node:

import pandas as pd

df = pd.DataFrame( { ‘col1’ : [1,2,3,4,5] , ‘col2’ : [‘a’,‘b’,‘a’,‘b’,‘c’] } )

df[‘col2’] = df[‘col2’].astype(‘category’)

output_table = df

This gives the error:
ERROR Python Source: Cannot setitem on a Categorical with a new category, set the categories first

Can someone confirm whether or not categorical data types can be handled in python script nodes?
Any possible workaround or plan to fix this issue?

Thanks!

Hi @bradmc2.

Welcome to the KNIME Community forum!

I can confirm that categorical data types can not be converted into a column in a KNIME table within the python script node.

There is a plan to address this. Current workaround would be converting categorical variables to simple types such as strings, integers, doubles before writing the dataframe to a KNIME table as shown below.

df['col2'] = df['col2'].astype('str')

We will update you here in this thread once this feature is available.

Regards,
Temesgen

2 Likes