I have a quitte challenging problem. I have to build a Regression Tree model and to get the features importance; BUT it’s not as simple as ussing the Spark Node and view. I have to predict the colum x0 (type: double) by ussing 6 columns: x1 (type: string), x2 (type: string), x3,4,5,6 (type: double). The point is that I need to get the importance, not of each column, but for each combination of columns; I mean:
x1 classes: a,b,c,d
x2 classes: w,x,y,z
x3 classes: a range from 0 to 10 with a step of 1
The point: I need to know the importance of each split (example: importance of a split given by a+y+(7-8).
As I understand your question, this isn’t currently supported with our Regression Tree nodes (either Spark-based or not). These nodes are generally just going to give you an importance for each feature, but not at the level of each split.
Have you seen this type of calculation done elsewhere?
I have seen this done here and I know it’s possible to get it done in Python and similar, but no information at KNIME.
I’ve tried to solve the problem by adding new columns, each one with a feature and a boolean value but the matrix becomes extremely large to perfom any regression…
If you had any ide about how to achieve this, I’d be grateful
If you have a working example in Python it should be possible to put that into a Python node in KNIME. To extract the information might be a small challenge but should be doable.
Hi @miguel
Since I am a very novice knime and data whatever, I would make an approach this way.
Step 1 Binning the columns x3,4,5,6,
Once done it, the decision tree can shows significant information in order to understand the significant combinations
Let me know how wrong am I in my understanding of your problem
Hi @mlauber71
The point was to get it done by ussing just KNIME, nevertheless I’ll have to do what you say if I’m not able to achive the goal in the future
Unfortunately I can’t publish my data… . However I can give you an example:
Identifier
Reaction Type
Reactor
Energy 1
Energy 2
Energy 3
Target
1
A
Carbon
1
11
21
0,006
2
B
Iron
2
12
22
0,007
3
C
Hydrogen
3
13
23
0,001
4
D
Gold
4
14
24
0,00009
5
B
Lithium
5
15
25
0,00004
This is just a short template.
I need to build a regression tree to get its 10 most important features acording to something similar to this… Thanks a lot for your interest