Feature optimization - precision and recall

zarniak · June 24, 2019, 8:04pm

Hi,

i am doing simple feature optimization as in standard example like here https://hub.knime.com/knime/workflows/Examples/04_Analytics/01_Preprocessing/03_Perform_Feature_Selection*YfW-zcG7PCSWUPMR with slight adjustment because i am trying to maximize on recall - no problem with that.

While in loop optimizing feature as outcome (loop end node) i have only recall and feature names, but i want to see also precision to find reasonable equilibrium - which features to use to get best recall at given precision.

Do you know how to get additional data from scorer to be stored while doing loops and optimizing features? how the example should be modified to see recall - feature - precision at the end for make proper decision?

I would be grateful for some hints in this matter…

stelfrich · June 27, 2019, 9:27am

Hi @zarniak,

The Feature Selection Loop is limited to minimizing one parameter. To integrate precision in the feature selection, you will have to come up with a mathematical formulation of the optimization goal. Implement this with a Math Formula (Variable) node and feed this to the Feature Selection Loop End node. I would suggest to start with recall + precision and continue developing your formula from there.

Best,
Stefan

zarniak · June 30, 2019, 6:31pm

@stelfrich thanks for answer but i think i haven’t stated my problem clearly - i am optimizing recall (only one parameter) and this is my main target. but while looping feature selection i want to know how precision is behaving

low precision in my will trigger much workload later on and i need to come up with good compromise between recall and precision that’s why i want to know what is the levels of precision for given best feature mix while feature selection.

beginner · July 1, 2019, 6:08am

Isn’t that why stelfrich is right with his suggestion? You optimze directly for a good balance between recall and precision.

As a general comment, be wary of forward / backwards feature selection. First you always use cross-validation (with fixed seed) within the loop vs just 1 split and even then the method is questionable at best. It will often select irrelevant features.

stelfrich · July 3, 2019, 6:21am

Multi-objective optimization - Wikipedia is an interesting read on the topic, @zarniak.

system · January 1, 2020, 6:22pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.