Feature importance formula from Random Forest Learner

Cole_Kingsbury · May 6, 2025, 3:47am

Hello everyone! I finally got the workflow of nodes in my random forest model to align in such a way that data leakage has been stopped. Model returns a 93% training/validation accuracy and a 91% test accuracy. Now I need to calculate feature variable importance. Is there a method to convert splits and candidates to importance as a percentage? Screenshot below for reference.

If so what is the formula? I remember seeing a screenshot someone posted that does just that but can’t seem to locate it. Any help would be so helpful.

Many thanks in advance!

~Cole K.

mlauber71 · May 6, 2025, 7:42am

@Cole_Kingsbury you can create your own formula how to handle these splits and candidates. One example is here:

Metanode “knime_ranfor” (also variable importance)

The Metanode “export_rank_of_variables”

Here are some suggestions how to handle the formula:

These formulas are not rocket science, just a way to aggregate the number of splits used.
You could just use the very first split as an indicator or you could add them up or you can weight them so a frist split would be more ‘valuable’ than a second one. It’s your choice.

$#splits (level 0)$+$#splits (level 1)$+$#splits (level 2)$

(($#splits (level 0)$)*3)+(($#splits (level 1)$)*2)+(($#splits (level 2)$)*1)

Or you weight all variables also the candidate status’
(($#splits (level 0)$)*6)
+(($#splits (level 1)$)*5)
+(($#splits (level 2)$)*4)
+($#candidates (level 0)$*3)
+($#candidates (level 1)$*2)
+($#candidates (level 2)$*1)

The result can look something like this:

Cole_Kingsbury · May 6, 2025, 9:21am

Thank you for this information, I do appreciate this. Im wondering if there is any documentation available that details this process

Mainly I am intending this study (regarding magma fertility based on geochemistry of zircons) to appear in an academic journal and would like to ensure that this methodology with respect to splits and candidates and how this relates to feature importance will not cause major concerns amongst peer reviewers. Are you aware of any documentation (publicly-available full text) or studies that mention this in the context of feature importance? Unfortunately, it appears Breiman et al. (1984) is under paywall.

Many thanks again for your assistance so far!