Parameter Optimization for non-numerical variables (GINI vs Gain) Workflow Example

My apologies and this seems like a pretty straight problem I’m having but after watching several YouTube videos, workflows, self-paced courses I’ve come to this last option. I’m trying to set up the Optimizer Start/Start Loop for a Tree Ensemble model and as long as the variable is an integer I’m good. However, I’m interested in including Split Criteria. When I add “new parameter” to the Start Loop, I make sure to uncheck “integer” and have attempted to fill in what I think the node is looking for but it will treat it as an integer no matter what. I see “Split Criteria” as a Flow Variable Option so I’m thinking this is possible but I also did read a post where it seemed including check boxes and strings into the Optimizer is pretty complicated.

Any workflows showing how others have achieved this would be greatly appreciated (including check boxes as I’m sure someday I may need this too). All the videos and workflows I’ve reviewed only included numerical variables. Thanks!

HI @rinaldiinjapan,

Here I have an example workflow to utilize the parameter optimization loop for split criterion in the Tree Ensemble Learner node (or any non-numeric parameter in general):

I have commented each step in the workflow but as a quick note, here you can convert the Int value from the loop start to your desired string value by using the Rule Engine Variable node.

4 Likes

@armingrudd WOW! Thank you so much. Safe to say I would’ve never figured this out and of all the dozens of workflows and examples that I found this is the first time I’ve see this. Thanks again.
m

2 Likes

While the topic is hot would it be took much to ask how you would address a variable that is selected via a check box? Such as “Use binary splits for nominal columns”?

1 Like

@armingrudd Actually upon closer inspection I THINK I can figure it out but I’m curious, when there is a binary option like my last question, would the checked box be considered step 1 or step 2?

Same thing if you were to try to use force split. Would step 1 be unchecked, and then step 2 etc would be the different strings?

Thanks again. So excited to use this.

1 Like

For what it’s worth, I’m not able to figure out how to create a flow variable for “Nominal Splits” and thus haven’t attempted “Force Split”. I pretty much copied the node for the “Split Criteria” but with two steps, tried 0 and 1, and 1 and 2 and neither worked. I get the following error message:

eclipse.buildId=unknown
java.version=1.8.0_152
java.vendor=Oracle Corporation
BootLoader constants: OS=win32, ARCH=x86_64, WS=win32, NL=en_US
Command-line arguments: -os win32 -ws win32 -arch x86_64

org.eclipse.ui
Warning
Tue Aug 22 11:26:18 PDT 2023
Warnings while parsing the key bindings from the ‘org.eclipse.ui.commands’ and ‘org.eclipse.ui.bindings’ extension point

eclipse.buildId=unknown
java.version=1.8.0_152
java.vendor=Oracle Corporation
BootLoader constants: OS=win32, ARCH=x86_64, WS=win32, NL=en_US
Command-line arguments: -os win32 -ws win32 -arch x86_64

org.eclipse.ui
Warning
Tue Aug 22 11:26:18 PDT 2023
Cannot bind to an undefined command: plug-in=‘org.knime.workbench.editor’, id=‘knime.commands.editor.gridSettings’

Dear @rinaldiinjapan,

I updated the shared workflow and added another Rule Engine Variable to generate a new flow variable with “true” or “false” value used for boolean settings (here, “use binary splits for nominal columns”).

3 Likes

Amazing stuff @armingrudd . By far the best example I’ve seen and again, I would’ve never figured this out. Thank you so much.
m

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.