I cross Validation workflow and Parameter Optimization Loop

SIngpaore_knime · June 15, 2023, 12:38am

I am new to use KNIME, I created the workflow with cross Validation workflow, it the workflow I used the parameter Optimization node loop to found a optimize parameter for my decision tree ( min. of of node , threads). But I have a following error in my flow.
"
WARN X-Partitioner 3:45 Unable to merge flow object stacks: Conflicting FlowObjects: <Loop Context (Head 3:46, Tail unassigned)> - iteration 0 vs. <Loop Context (Head 3:45, Tail 3:43)> - iteration 4 (loops/scopes not properly nested?)"

AlexanderFillbrunn · June 15, 2023, 12:32pm

Hi,
You cannot have the connection from the first X-Partitioner to the second one. That is because when you track your flow, every loop start must be ended by the corresponding loop end, no matter which branch it goes. But your bottom X-Partitioner’s loop end node (the X-Aggregator on the bottom right) is not in the flow at the top. Why do you have two X-Partitioners anyways?
Kind regards,
Alexander

SIngpaore_knime · June 16, 2023, 2:46am

Hi Alexander,
Thank you for your prompt reply. I intend to use the top X-Partitioners for finding the optimize parameter of Decision tree (No. of thread and minimum number record per node). Therefore the upper loop further split the data (80/20) for this purpose . I would like to ask, is there any more material/guideline of how to write a loop workflow ? Thanks

Best Regards,
CK Lau

SIngpaore_knime · June 16, 2023, 2:50am

Is any comment on this workflow?
From the Node 40, I get the best parameter then. I will input into the Node 41 (decision tree).
I have a question , why I need to connect the Node 45 (variable) to Node 46( Parameter optimization Loop start). Thanks

AlexanderFillbrunn · June 16, 2023, 9:16am

Hi,
You can learn more about loops in our self-paced training. I do not think you need nested cross validation. The Parameter Optimization loops through the different hyperparameters and the X-Partitioner splits the data into test and training sets. So just keep the top part and connect the input of the top X-Partitioner to the Column Filter at the bottom left.
By the way: the number of threads is not a hyperparameter of the decision tree. It just tells the algorithm how many threads to use for the computations. It does not matter what you enter there, the tree will always be the same, only the speed with which it is created may change. But trying out different values here does not make any sense in the context of cross validation.
Kind regards,
Alexander

SIngpaore_knime · June 19, 2023, 12:30am

Thanks Alexander. For reviewing my workflow. Thank you very much

ajisafeadeoye · June 20, 2023, 2:58pm

You are very correct

ajisafeadeoye · June 20, 2023, 3:06pm

Also , may i ask why you are using the loop start , you dont need that , just use the partitioning node after column filter , remove the x-agrregator in the loop for optimization , remove x-partioning after the loop start. You can further add decition tree Vier node to view your Decision Tree.

Hope that helps

SIngpaore_knime · June 21, 2023, 1:55am

Hi ,
Actually , I want to create a workflow to optimize the min number records per node (In decision tree) hyper parameter by cross validation. therefore , i am using the loop start with X-practitioner and X-aggregator in my flow.

Best Regards,
Lau

SIngpaore_knime · June 23, 2023, 4:06am

Dear all,

I have further question, I want my raw data (80%) go to the training (decision Tree Learner node 36) and 20% for testing (Node 37), do I need to add partitioning node (Node 63) between column filter node 57 and K-partitioner Node (33) ?

What is the different of K-partitioner and partitioning ?

Thanks

AlexanderFillbrunn · June 23, 2023, 7:48am

Hi,
You do not need the Partitioning node, because Cross Validation is already doing Partitioning (in the K-Partitioner). In Cross Validation, you split your training data into k partitions. Then you train your model on k-1 partitions and use the other one for testing. In every iteration, another partition is the test set. The K-Partitioner takes care of this for you and outputs a different training and test set in every iteration.
Kind regards,
Alexander

SIngpaore_knime · June 23, 2023, 11:45am

Thanks Alexander for the prompt reply

system · September 21, 2023, 11:46am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.