Linear Regression Predictor fails

themetzlisa · March 22, 2019, 12:58pm

Hey everybody,

my workflow is perfectly right but Fails in a backward Feature Elimination when it Comes to the Regression predictor. It says there is not enough data for the learner which is not true and then there is just the error no matches

lisovyi · March 22, 2019, 2:42pm

Would it be possible to share a minimal workflow example? Ideally, with some small open dataset such that one can reproduce the problem.

There are several nice examples of the Feature Selection Loop node. For example, here is one using Naive Bayes classifier

themetzlisa · March 25, 2019, 9:13am

I tried to upload the workflow but it does not work, so I post a Picture. I think I can’t share the whole data

themetzlisa · March 25, 2019, 9:38am

I played around a bit and now I can only include one variable in the Regression learner. The other ones are not shown and I don’t know why.

lisovyi · March 25, 2019, 10:06am

Your workflow as well as learner setup look good. So it might be something tricky. Could it be the same issue as reported here? There was a solution found in that case.

If not, I would try to run the workflow on some public dataset to identify if the problem is with the workflow or with the data. If you get the same problem with a random public dataset, then it is likely to be something with the workflow, and then you can share it to facilitate debugging. If it runs on a random data, then it likely has to do with the data at hand and debugging will be difficult.

Regarding your last reported issue. Could be it due to a particular chosen subset of features that was used in the last feature selection loop?

themetzlisa · March 25, 2019, 11:15am

I already tried that but it says there were no Special characters found. I also trief to use the backward Elimination metanode provided by knime,
I also tried the solution with the public data set and the error did not occur.
The Thing that only one variable occured when I tried to specify the Regression also did not occur again.
I included the data set, maybe someone can tell me, where I Need to Change it or if there is something really strange in it that made the erros occur?
Thanks so much in advance!Backward.xlsx (175.8 KB)

lisovyi · March 25, 2019, 1:10pm

The problem seems to be the same as in the earlier post. There are new-line characters in the column names, e.g. “Ersatzteilart:\nWerkzeug A”. One can see it by saving the data in csv format directly from Excel and then opening the file with a text viewer.

If I read in the file sample that you have provided with Excel Reader and then rename columns with the solution suggested by @nemad here I can successfully run Backward Feature elimination

themetzlisa · March 25, 2019, 2:59pm

Thanks. It worked I still have only one variable though if I try to run a Forward Feature Elimination. Do you know why that could be?

lisovyi · March 25, 2019, 3:43pm

Do you mean Backward Feature Elimination? If so, the feature selection loop works in a way that it strips of one-by-one features with the least contribution to the selected metric. So if you do not limit the number of features in the configuration of the loop start, you’ll end up with a single feature on the last iteration of the loop and that’s what will be displayed after the loop ended.

Side note: I noticed that you use Scorer node, but in fact you build a regression model. If you assume your target to be continuous number, you should use Numeric Scorer instead. If you assume it to be categorical (that in a particular case might be encoded with integer numbers), then you should convert it into strings and use Logistic Regression Learner (and predictor).