Sorting Columns based on Field Value - Assign values by Column Name

Hello Knime Community,

currently I’m working on my dataset for my thesis.
From the raw data (JSON) I extract one field which contains several values seperated by a comma.
At the moment I use the Group By Node to create a sorted list and the Ungroup Node to delist. Afterwards I use the Cell Splitter Node to create a Column for each value as you can see in the screenshot.

The issue is now, that Columns are not really sorted. What I try to achive is that every “Tagfahrlicht”, “Tempomat” and so on is alligned in one Column (ideal with Column Name “Tempomat”).

Is there a ways to do this?

Thanks and have a good day,
Sven

Hi @sven-abx

The Column Resorter node can you help out. If you resort the columns and then apply the aggregation of the columns, you can control in which order the columns are merged/concatenated.

image

1 Like

Hi @ArjenEX ,

thanks for your reply.
The thing is I have a field where all the values are combined which looks like this: Tagfahrlicht, Tempomat, Zentralverreiegelung - one field and the values are seperated by a “,”.

My goal is now to have a column not named Sorted Array 1 and different values but a column called Tagfahrlicht and then every value “Tagfahrlicht” in this column.

Thanks and have a nice day,
Sven

To be more clear:
Source Column and values:

Now I use the Group By Node to sort the values

Then Ungroup the values and delist them

Then I use the Cell Splitter Node to seperate the values.

Update:

I was able to create the correct Column Names by doing it this way:

Now I just need to fill in the values in the corresponding column name.
edit: this can be the value or a 1 yes this value is in the list or a 0 for no, this value is not in the list. Maybe with a Python Script?!

Thanks,
Sven

1 Like

Update:
I implemented a Python Script (legacy) Node:

import pandas as pd

df = pd.DataFrame(input_table_1)
df_2 = pd.DataFrame(input_table_1)

df = pd.DataFrame(df.equipment.str.split(','))
df = pd.get_dummies(df.explode('equipment')).groupby(level=0).agg(sum)

output_table_1 = pd.merge(df_2, df, left_index=True, right_index=True)

Input Table:

Output after the Python Node:

I don’t know why the first ABS isn’t recognized.

BR,
Sven

Do you have a sample file regarding your last input table? Certainly easier for the KNIME community to help then
But here some sample idea

import pandas as pd

# Sample dataset

data = {"col1": ["a,b,c", "a,d,e", "f,b,c"]}

# Create a DataFrame from the dataset

df = pd.DataFrame(data)

# Split col1 by "," delimiter and create new columns

split_cols = df["col1"].str.split(",", expand=True)

# Perform one-hot encoding

one_hot_encoded = pd.get_dummies(split_cols, prefix="col")

# Concatenate the one-hot encoded columns with the original DataFrame

result_df = pd.concat([df, one_hot_encoded], axis=1)

# Drop the original 'col1' column

result_df.drop("col1", axis=1, inplace=True)

# Print the final DataFrame

print(result_df)

br

Hi @Daniel_Weikert,

thanks for your reply.
I edited your Code to have it in KNIME as follows:

import pandas as pd

df = pd.DataFrame(input_table_1)

# Split col1 by "," delimiter and create new columns
split_cols = df["equipment"].str.split(",", expand=True)

# Perform one-hot encoding
one_hot_encoded = pd.get_dummies(split_cols, prefix="equip")

# Concatenate the one-hot encoded columns with the original DataFrame
result_df = pd.concat([df, one_hot_encoded], axis=1)


# Drop the original 'col1' column
#result_df.drop("equipment", axis=1, inplace=True)

# Print the final DataFrame
#print(result_df)

output_table_1 = result_df

My problem is now, that the table is expanding.

My goal is one Column ABS, one Android and so on.

BR,
Sven

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.