effect of input variables in prediction

Hello,

 

I am using several classification models for prediction(naives bayes, decicion tree, neural networks and logistic regression).

 

I like to get the information which input variables in my classification model improve the prediction and what kind of effect they have.

 

Ist that possible?

 

Thank you

Andra

Dear Andra,

in this post https://tech.knime.org/forum/knime-general/variable-importance-in-prediction-classification-or-regression-molels

I attached a similar workflow calculating the variable importance. The basic idea is: for calculating the importance of the variable x, you leave it out for the prediction. If not using the variable for the prediction decreases the accuracy you know it is important.

And by repeating this for all columns you have an indicator how important each column is.

Best regards, Iris

Hi Iris,

thank you for your reply and the workflow! I will try that.

However, this seems to be quite complicated for something that is probably needed by a lot of people. And I assume that the model itsself is also looping through the variables.  So I found it odd that the effect of the variables is not automaticaly integrated in a node like the scorer node.

Are you planning to integrate that in the future?

Thank you!

Andra

 

You cannot integrate it in the scorer, because this is model dependent. You would need to integrate it into each of our model nodes and this won't happen anytime soon. Especially as KNIME nodes are considered to be modular. They should be quite flexible and usable in different scenarios. 

We do have our tree ensemble nodes, they provide you with an measure for variable importance as well. 

Best, Iris 

Hi Iris,

 

I see.

 

Do you also have an example on how to use the tree esemble node to meassure for variable importance?

Best,

Andra

Hi Iris,

if I understand your example workflow correctly, you are predicting with each input variable independently in the loop, right?

I am trying to use the backward feature elimination now to leave out variables to see if the error rate increases without this variable as input. However, I am actually more interested in the decrease of cohen's cappa. Do you know something similiar to the backward feature elimination with cohen's cappa as evaluation?

 

Thank you!

Andra

Hi,

 

has anyone an idea?

Thanks

Andra

Hi Andra,

the example I uploaded can be modified to fit you needs. Just exchange the decision tree learner and predictor to tree ensemble ones and it is learned for tree ensemble.

About your second question. you can do this with the loops in KNIME Analytics Platform yourself. The backward feature elimination is a more simple way. You would need two loops. The inner calculates for all features which one to remove next and the outer remove the one with the lowest error and sends back the ones you can still work on. The outer is than controlled with a recursive loop.

Best, Iris