Porting Knime one to many node into python?

Durkweed · March 12, 2025, 2:29pm

Background: I’m working on some Kaggle challenges

Problem: The one to many node in Knime labels columns differently than the one hot encoder in Kaggle’s Python/Jupyter notebooks.

As a result, the models I built and trained in Knime don’t work correctly.

What’s the best way to address this issue? I’m slowly picking up python so the Knime environment is easier to work in for me.

ActionAndi · March 12, 2025, 3:47pm

Hi,
are you sure that there’s a difference? I just checked with a simple example and it’s very similar (KNIME uses integer instead of double values):
Table Creator:

import knime.scripting.io as knio
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

# Get data from KNIME
df = knio.input_tables[0].to_pandas()
df = df.reset_index(drop=True)
cols = df.columns

# EDIT HERE!
cols_to_convert = ['Color', 'Size']
other_cols = list(set(cols) - set(cols_to_convert))

# initialize OneHotEncoder 
encoder = OneHotEncoder(sparse_output=False, drop='first')  # drop='first' category

# transform Categoricals
encoded_data = encoder.fit_transform(df[cols_to_convert])

# Write result in dataframe
encoded_df = pd.DataFrame(encoded_data, columns=encoder.get_feature_names_out(cols_to_convert))

# add numerical cols
final_df = pd.concat([df, encoded_df], axis=1)

# output
knio.output_tables[0] = knio.Table.from_pandas(final_df)

I used the “Math Formula (Multi Column)” Node to convert the integer columns to double.

Results:
Python:

KNIME One to many:

PS: ignore the red sign at my Python Script, I was testing something else…

Durkweed · March 12, 2025, 4:07pm

Thanks for the quick reply!

As you can see from your screenshot of Python and Knime One to Many the column names are still different.

i.e: Color_green vs green, Size_M vs M

The Knime trained models expect exact matching column names, and when I use the Python encoder the different column names throws everything off.

ActionAndi · March 12, 2025, 4:23pm

How about renaming them?

Durkweed · March 12, 2025, 5:32pm

Hoping for a reusable solution.

My use case has 60 something columns, and I’m trying to avoid having to do that for all future cases I need one hot encoding.

mlauber71 · March 13, 2025, 2:28am

@Durkweed it seems you can just add the column name with an underscore. You would have to construct a loop.

ActionAndi · March 13, 2025, 5:44am

Hi,

yeah you are right, thats a bit tricky for 60 cols.
Here is a solution that might help.

And in addition I found a strange behaviour of the “One To Many” node.
By default it does not add the column to new columns and just uses the entries. Like “red” instead of “color_red” or “red_color”.
If two columns have similar entries it does add the column name automatically to avoid doublettes in columns names. For example:

Leads to columns with

But:

to:

I will add this topic to the “Bugs” Section

Durkweed · March 13, 2025, 7:32am

@mlauber71 @ActionAndi

Thanks for the suggestions.

To be more precise here’s a screenshot of the issue.

The left column is the original. The middle column is what I’m getting with the one hot knime node. The right column is what python outputs.

Looking at it with fresh eyes, it seems that Knime appends the column names to the end instead of the beginning and as ActionAndi found only if there are similar names? Not sure of the exact behavior of Knime.

ActionAndi · March 13, 2025, 7:54am

Yeah.
Look at the workflow I’ve shared. I’ve tweaked the naming so it matches with the python names.
If you need other naming conventions just change the “string manipulation” node within the loop

gonhaddock · March 13, 2025, 7:54am

Hello @Durkweed and welcome to the KNIME community

You can take a look to the following post, because this component in the Community-HUB’s workflow, will take the encoding job for the columns; And it would be helpful for your use case.

Be aware -as explained in the post-, that the workflow drops a column for each set, aiming to avoid ‘dummy trap variable’

BR

nan · March 13, 2025, 8:26am

Agree, we plan to tackle this when transitioning the node to the new dialog (internal ticket reference UIEXT-1904).

system · June 11, 2025, 8:26am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.