Python Script Error and KNIME Workflow Guidance

bluecar007 · October 19, 2024, 9:32am

I am trying to use Python to manipulate my Excel file. When I run the Python script on my Excel file in Spyder, it works perfectly with no errors. However, when I move the file to the Python Script node in KNIME, I encounter an error: ‘Execute failed: Row key checking: Duplicate key detected: “Row10”.’ I’m not sure what this means. I’ve attached the workflow and the output file I get when running it in Spyder for your reference

I suspect that KNIME’s node is identifying duplicate row keys (identifiers for rows). Which rightly so my output does contains duplicate row keys (identifiers for rows) for this step of the workflow, I will further remove duplicates in the later stage of the workflow. How do I overcome this error?

Additionally, i would like to ask how to view the output of my excel table post a successful processing of my python script? Will i be able to see under the table output section of the python script node? I just want to see it in the console if the data is processed successfully?

Also can I connect a Row Filter node to continue filtering my excel table post my python script processing? or I need to connect the python script node to a table writer node first then connect to a row filter node?

Thanks!
KNIME_project_Python_test1.knwf (75.8 KB)
BEFORE_IP_Asset_Data_v1.xlsx (14.5 KB)
AFTER_Processed_IP_Asset_Data_v1.xlsx (12.1 KB)

mlauber71 · October 19, 2024, 9:50am

@bluecar007 I think the Python integration in knime does not like duplicate indexes in pandas dataframes. You could try and reset them before or within the Python node.

df.reset_index(inplace=True)

Concerning the other things. You can edit excel files using openpyxl and the opening them thru the excel reader.

bluecar007 · October 19, 2024, 10:07am

Apply the function to the dataframe

expanded_df = split_ip_addresses(df)

knio.output_tables[0] = knio.Table.from_pandas(expanded_df)

df.reset_index(inplace=True)

I place the “df.reset_index(inplace=True)” at the end of my python code in the python code node, it still gives me the same error "Execute failed: Row key checking: Duplicate key detected: “Row10"”.

bluecar007 · October 19, 2024, 10:12am

import knime.scripting.io as knio
import pandas as pd

df = knio.input_tables[0].to_pandas()

Function to clean and split IP addresses into individual rows

def split_ip_addresses(df):
# Create a list to store the expanded rows
rows =
for _, row in df.iterrows():
    ip_addresses = str(row['IP Addresses']).split(',') if pd.notna(row['IP Addresses']) else []
    # Clean up any leading/trailing spaces for IP addresses
    ip_addresses = [ip.strip() for ip in ip_addresses if ip.strip()]

    if len(ip_addresses) > 1:
        # Duplicate the row for each IP address and assign a single IP per row
        for ip in ip_addresses:
            new_row = row.copy()
            new_row['IP Addresses'] = ip
            rows.append(new_row)
    else:
        # If only one IP or no IP, just append the row as it is
        rows.append(row)

# Create a new DataFrame from the list of rows
new_df = pd.DataFrame(rows)
return new_df
Apply the function to the dataframe

expanded_df = split_ip_addresses(df)

knio.output_tables[0] = knio.Table.from_pandas(expanded_df)

This is my python code any advice where i should put the “df.reset_index(inplace=True)” as suggested? There are a few dataframes in my code

mlauber71 · October 19, 2024, 11:10am

The function does reset the index for the dataframe „df“. You would want to reset the one you are bringing back to knime as output Table.

You have several of them you might want to make sure the operations are on the right ones.

system · October 26, 2024, 11:11am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.