Pandas .unique() not working

mwildor · June 5, 2019, 2:32pm

Hi, I’m using the Python Learner and Predictor nodes to create an XGBoost model. I want extract a list of the unique values in a column from my dataframe by using:
cols = list(df[‘Outcome’].unique().astype(int)), but the .unique() does not seem to work (nor does .nunique(), which returns a count of the unique values). Is there a particular reason for this?

mlauber71 · June 5, 2019, 5:54pm

How about this approach

import pandas as pd

df = input_table.copy()

cols = len(list(df[‘Species’].unique()))

output_table = pd.DataFrame(
{ “no_of_unique_species” : [cols ]
}
)

kn_example_python_iris_unique_values.knwf (16.6 KB)

mwildor · June 6, 2019, 7:42am

Hi, thank you for the quick reply. I’ll give more detail as my first question was written in haste. I want a list of all the the unique values in a column, but cols = list(df[‘Outcome’].unique().astype(int)) was returning only one value.
I checked the values by using groupby immediately before the python node and all are there as I expected. I tested the line of python code on a different column and it actually works fine. I’ve applied the same python snippet back along the workflow to check at what point it starts to fail, and I’ve traced it back to where the Outcome is mapped to the rows (Outcome is based on 2 variables, numbered 1-25 as strings).
The mapping is done in a loop using Table Row to Variable Loop Start, Rule Based Row Filter, Rule Engine and Loop End nodes. I’ll try writing the loop in python instead to see if it resolves the issue:+1:

Update: Removing the Knime Loop and replacing it with a Python snippet to map the values has resolved the issue

system · December 5, 2019, 7:42pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.