I am trying to use Python to manipulate my Excel file. When I run the Python script on my Excel file in Spyder, it works perfectly with no errors. However, when I move the file to the Python Script node in KNIME, I encounter an error: ‘Execute failed: Row key checking: Duplicate key detected: “Row10”.’ I’m not sure what this means. I’ve attached the workflow and the output file I get when running it in Spyder for your reference
I suspect that KNIME’s node is identifying duplicate row keys (identifiers for rows). Which rightly so my output does contains duplicate row keys (identifiers for rows) for this step of the workflow, I will further remove duplicates in the later stage of the workflow. How do I overcome this error?
Additionally, i would like to ask how to view the output of my excel table post a successful processing of my python script? Will i be able to see under the table output section of the python script node? I just want to see it in the console if the data is processed successfully?
Also can I connect a Row Filter node to continue filtering my excel table post my python script processing? or I need to connect the python script node to a table writer node first then connect to a row filter node?
@bluecar007 I think the Python integration in knime does not like duplicate indexes in pandas dataframes. You could try and reset them before or within the Python node.
df.reset_index(inplace=True)
Concerning the other things. You can edit excel files using openpyxl and the opening them thru the excel reader.
I place the “df.reset_index(inplace=True)” at the end of my python code in the python code node, it still gives me the same error "Execute failed: Row key checking: Duplicate key detected: “Row10"”.
Function to clean and split IP addresses into individual rows
def split_ip_addresses(df):
# Create a list to store the expanded rows
rows =
for _, row in df.iterrows():
ip_addresses = str(row['IP Addresses']).split(',') if pd.notna(row['IP Addresses']) else []
# Clean up any leading/trailing spaces for IP addresses
ip_addresses = [ip.strip() for ip in ip_addresses if ip.strip()]
if len(ip_addresses) > 1:
# Duplicate the row for each IP address and assign a single IP per row
for ip in ip_addresses:
new_row = row.copy()
new_row['IP Addresses'] = ip
rows.append(new_row)
else:
# If only one IP or no IP, just append the row as it is
rows.append(row)
# Create a new DataFrame from the list of rows
new_df = pd.DataFrame(rows)
return new_df