Error in Python script node

I have used a Python scripting within my KNIME workflow. Within the configuration window, the code runs successfully but throws and error when trying to actually run the node. Execute failed: Output DataFrame contains duplicate values in its index: ‘0’, ‘1’, ‘2’, and others. This is not supported. Please make sure that each entry in the index (i.e., each row key) is unique. Can you please help me with the solution.

Hi,
your DataFrame has probably duplicate index entries.
Try
df.reset_index(inplace=True)

4 Likes

Are your row keys unique?

1 Like

@mohini1329 welcome to the KNIME forum. You might want to store a unique rowid before using the (new) python node.

Also you should maybe store this Rowid and later check if you have done the right transformations.

I just encountered a thing when using thre python nodes that you might run into problems when you do joins but did not reset the index within the python node since KNIME would use the rowid as index seemingly.

1 Like

Hi @mohini1329,

as the others have already suggested, you should make sure that each row in the Pandas DataFrame has a unique index.

Dropping/resetting the index is not the best option though, because KNIME uses the RowID (which is equal to the DataFrame index) to identify rows when you select / highlight individual cells in your data. If you reset the index, then this row identification will no longer work.
The best solution would be to let the index of all values in your input table as they are, but assign new unique indices (which are actually string based) to all new rows that you add.

@mlauber71: the RowID node that you mentioned has the same problem if “enable hiliting” is disabled.

If you do not care about selection/hiliting, then dropping the index is fine, though :slight_smile:

Hope that helps,
Carsten

Hello @mohini1329 and welcome to the KNIME forum.

Have you tried to reset the index values to your DF?

df = pd.concat([df.reset_index(drop=True)], axis=1).copy()

BR

@carstenhaubold, @mohini1329 I have created an example where the RowID is handled and reset and applied from within the Pandas dataframe.

I think it would be good to mention the handling of the RowIDs somehwere in then Python documentation. And also one should be aware that if you copy a dataframe the index might also be resetted. So we should be careful with that.

What woudl be great is a Counter node that would support Long (integers) to handle very large data sets.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.