Here is my workflow.
I have a very small dataset. As a result, my RF model exhibits severe overfitting. Therefore, I try to use a Lasso regression learner and evaluate its generalization ability with cross-validation.
As shown in the workflow above, an error message “Execute failed: Encountered duplicate row ID ‘Row0’” is displayed. I attempted to filter out ‘Row0’ using a row filter, but shows that ‘Row1’ also duplicated, and 3,4,5…
Upon further observation, I have identified the issue. The following is the result of the random forest predictor:
The following is the result of the regression predictor (sklearn):
It appears that the regression predictor (sklearn) is unable to retain the original Row ID of the data, causing issues with the X-aggregator during cross-validation.
Can any expert help me solving this issue?