Parameter optimization and dialog options in learning algorithms

Hi everyone,

Is it possible to to include the different options in learning algorithms into parameter optimization?

For example, I want to do a parameter optimization with decision tree ensemble algorithm. I use parameter optimization loop nodes for parameters like number of models, minimum split node size, etc. which require integers. However, there are other options in the dialog regarding the algorithm such as split criterion (drop-down menu selection) and use mid point splits (checkbox). Is it possible loop them in the optimization process?

I tried Table creator, Table row to variable loop start and loop end nodes to include these opptions, but I get the following error: "Errors loading flow variables into node : Unable to parse split criterion "Information Gain"

Any ideas and suggestions will be highly appreciated.

Thanks in advance.

Bora

Some tips:

  • I have read somewhere in the forum -so I my memory might not serve so well- that boolean values could be specified with String type flow variables with TRUE or FALSE values. That would solve the checkbox problem.
  • The proper values of the selection, you can find out if you specify a nem for the output (flow variables tab, text field in the row of the flow variable) and check that flow variable's content. Changing in the dialog to the possible values, you can find out the valid values.

Cheers, gabor

 

Hi Gabor,

I saw the same message which claims it is rather easy to deal with drop-down menu selections than checkboxes (boolean input) and both can be solved thru Rule Engine Variable node, but I cannot see how.

I tried several things, unfortunately none of them worked.

I know it is as easy as it has been said, but I am missing something, indeed.

 

Bora

Hi Bora,

I hope the following screenshot help how to reveal the possible values:

As much as I like the Rule Engine Variable node, I think you will need just a plain Table Creator node with a Table Row to Variable (or its Loop Start version) nodes for the iteration.

Cheers, gabor

PS: It seems I was wrong regarding the boolean values. Sorry about that. Lower case should be. For quality: "Gain ratio", "Gini index" are the possible values (without quotes). For pruning: "No pruning" and "MDL".

I have not tried the looping, but that should work.

Thanks Gabor,

I will give a try to your suggestion. Hoping to succeed!

 

Bora

Hi Gabor,

there is no slightest difference in the result between different "dialog options". Do you have any idea what I am doing wrong?

Hi Gabor,

Should I use the exact strings defined at  https://tech.knime.org/docs/api/constant-values.html#org.knime.base.node.mine.decisiontree2.learner2.DecisionTreeLearnerNodeModel2.SPLIT_QUALITY_GINI

Can you show your flow variables dialog? Mine was just to help find out the possible values. For setting the values, you should use the dropdown lists as usual.

Yes, these are the constants used.

Thanks Gabor,

this works fine.

 

Regards

Bora

Hi Gabor,

there is one more thing: in my example, there are two settings with each one having two options ("No pruning vs. MDL" and Reduced error pruning (REP) "true vs. false").

There has to be 2x2=4 combinations just for these settings:

No Pruning-REP (true)

No Pruning-REP (false)

MDL-REP (true)

MDL-REP (false)

However, te variable loop start node just reads every single row ad takes them into account. So, there is only two combinations because of two rows in the table:

No Pruning-REP (true)

MDL-REP (false)

How can make the loop take into account other combinations? I tried recursive loop, but doesn' seem to be working properly.

 

Bora

Bora,

the Table Row To Variable Loop Start reads a single row per iteration, turning every column into a variable. Every row is one combination to try.

Hence, your input table has to contain every set of possible values in a dedicated line, like this:

No Pruning-REP MDL-REP
true true
true false
false true
false false

If you want to add one more variable, you have to combine every setting of that variable with every one of these lines. Example: if you have an integer, and want to try 22 different values of that with every combination of pruning, you will need 22*4 = 88 rows.

Hi Marlin,

Is there a better way than manually combining all possible values with one another?

I would like to optimize decision tree learner not just with the integer/real values (via optimization nodes),but with all the string and boolean options in the dialog, but doing this manually is not practical and feasible.

Do you have any ideas on this matter? I am really stuck at this point.

Create a collection of the possible values in a row and then Ungroup them one-by-one, you will get the direct product of the values. (I think the programming languages usually have nodes for this, but also my HiTS project had a node for that.) Using the Ungroup is not that bad. Creating the collection cells might be a bit harder.

You could also create a separate table for every variable, containing only the possible values of that variable. You could use data generation nodes for some of these, and Table Creators for others, depending on the type. You could even reuse them via renaming. And then just combine all these tables with a bunch of Cross Joiners, filter rows you don't want to test, and there you go!

Ah, I remembered something for direct product in recent KNIME versions, but could not find between the row manipulators, nor for matrix. I should have checked the columns too. The Cross Joiner is a good name, just not what I was looking for.

Thanks Marlin.

Hi Marlin and Gabor,

Cross Joiner node works fine, at least much better than manually creating all the combinations needed.

Thanks for your help and suggestions.

Regards

Bora