regression predictor (sklearn) node can't retain the original row id


Here is my workflow.

I have a very small dataset. As a result, my RF model exhibits severe overfitting. Therefore, I try to use a Lasso regression learner and evaluate its generalization ability with cross-validation.

As shown in the workflow above, an error message “Execute failed: Encountered duplicate row ID ‘Row0’” is displayed. I attempted to filter out ‘Row0’ using a row filter, but shows that ‘Row1’ also duplicated, and 3,4,5…

Upon further observation, I have identified the issue. The following is the result of the random forest predictor:
image

The following is the result of the regression predictor (sklearn):
image

It appears that the regression predictor (sklearn) is unable to retain the original Row ID of the data, causing issues with the X-aggregator during cross-validation.

Can any expert help me solving this issue?

I solved this problem by using the RowID node. I added an additional column with alternate IDs, and the RowID node replaced the duplicated IDs with the alternate IDs.

2 Likes

Nice workaround - thanks for posting it to benefit everyone.

And welcome to the KNIME forum! :slight_smile:

Hi @7112051138,

thank you for bringing this issue to our attention, and for sharing your workaround!
This will be fixed in the next release. As of now, the fix is available via the nightly community trunk https://update.knime.com/community-contributions/trunk update site.
Sorry for the inconvenience, and welcome to the KNIME forum :slight_smile:

Best,
Seray

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.