regression predictor (sklearn) node can't retain the original row id

Here is my workflow.

I have a very small dataset. As a result, my RF model exhibits severe overfitting. Therefore, I try to use a Lasso regression learner and evaluate its generalization ability with cross-validation.

As shown in the workflow above, an error message “Execute failed: Encountered duplicate row ID ‘Row0’” is displayed. I attempted to filter out ‘Row0’ using a row filter, but shows that ‘Row1’ also duplicated, and 3,4,5…

Upon further observation, I have identified the issue. The following is the result of the random forest predictor:

The following is the result of the regression predictor (sklearn):

It appears that the regression predictor (sklearn) is unable to retain the original Row ID of the data, causing issues with the X-aggregator during cross-validation.

Can any expert help me solving this issue?

I solved this problem by using the RowID node. I added an additional column with alternate IDs, and the RowID node replaced the duplicated IDs with the alternate IDs.


Nice workaround - thanks for posting it to benefit everyone.

And welcome to the KNIME forum! :slight_smile:

Hi @7112051138,

thank you for bringing this issue to our attention, and for sharing your workaround!
This will be fixed in the next release. As of now, the fix is available via the nightly community trunk update site.
Sorry for the inconvenience, and welcome to the KNIME forum :slight_smile:


1 Like