Features Importance (Not simple)

Hello there!

I have a quitte challenging problem. I have to build a Regression Tree model and to get the features importance; BUT it’s not as simple as ussing the Spark Node and view. I have to predict the colum x0 (type: double) by ussing 6 columns: x1 (type: string), x2 (type: string), x3,4,5,6 (type: double). The point is that I need to get the importance, not of each column, but for each combination of columns; I mean:

  • x1 classes: a,b,c,d
  • x2 classes: w,x,y,z
  • x3 classes: a range from 0 to 10 with a step of 1

The point: I need to know the importance of each split (example: importance of a split given by a+y+(7-8).

If it isn’t clear, please contact me

Best Reagards!
Miguel

Hi @miguel -

As I understand your question, this isn’t currently supported with our Regression Tree nodes (either Spark-based or not). These nodes are generally just going to give you an importance for each feature, but not at the level of each split.

Have you seen this type of calculation done elsewhere?

Hi ScottF

I have seen this done here and I know it’s possible to get it done in Python and similar, but no information at KNIME.
I’ve tried to solve the problem by adding new columns, each one with a feature and a boolean value but the matrix becomes extremely large to perfom any regression…
If you had any ide about how to achieve this, I’d be grateful

Best regards!
Miguel

If you have a working example in Python it should be possible to put that into a Python node in KNIME. To extract the information might be a small challenge but should be doable.

Hi @miguel
Since I am a very novice knime and data whatever, I would make an approach this way.

Step 1 Binning the columns x3,4,5,6,
Once done it, the decision tree can shows significant information in order to understand the significant combinations
Let me know how wrong am I in my understanding of your problem

regards,

Hector

Hi @hsrb
I have already tried your solution but I get a quite poor response, it’s too unaccurate… Thanks for your time!

Best Regards,
Miguel

Hi @mlauber71
The point was to get it done by ussing just KNIME, nevertheless I’ll have to do what you say if I’m not able to achive the goal in the future

Best Regards,
Miguel

Hi @miguel,
what is your threshold goal? 70, 80 90…?
If you post a piece of your dbase, I could try
Regards and success

Hector

Hi @hsrb

Unfortunately I can’t publish my data… :confused:. However I can give you an example:

Identifier Reaction Type Reactor Energy 1 Energy 2 Energy 3 Target
1 A Carbon 1 11 21 0,006
2 B Iron 2 12 22 0,007
3 C Hydrogen 3 13 23 0,001
4 D Gold 4 14 24 0,00009
5 B Lithium 5 15 25 0,00004

This is just a short template.
I need to build a regression tree to get its 10 most important features acording to something similar to this… Thanks a lot for your interest

Best Regards,
Miguel

ok @miguel.
I will give a try and let you know
regards
Hector

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.