Collecting Results of Cross Validation

violaw · October 24, 2014, 8:36am

Hi, I have a simple dataset on which I apply M5Rules. To figure out which rules might be best, I tried to use Cross Validation consisting of 10 rules. I attached a picture of my workflow. Is there a way to collect the rules as well as the R^2 (errors) of all loops? Thanks a lot!!

workflow2.png

Ergonomist · October 24, 2014, 10:21am

Hi Violaw,

below a snapshot showing my solution for R randomForests - best I could come up with so far. "Java Edit Variable" appends the filename with the iteration number. Hope this helps.

Cheers
E

thor · October 24, 2014, 11:23am

Actually, we could add an optional input port to the X-Aggregator that collects the model in each iteration. A third output port could then have a data table with the model from each iteration in a row. They can then be extracted with the "Cell to Model" node lateron. I will open a feature request.

Aaron_Hart · October 24, 2014, 1:01pm

And in the mean time, here is a workround based on a counting loop start for k-fold cross validation with whatever scoring statistics you want.

violaw · October 24, 2014, 1:33pm

Hi, thanks a lot for the fast response!

The workflow above looks really complicated. I am quite new to knime so please forgive my stupid questions but

- Why do you extract variables before starting the loop?

- What did you write into the Java Snippet?

Thanks, Viola

violaw · October 24, 2014, 1:35pm

And thanks for the workaround! However, does it collect information about the model itself or only about the model evaluation?

Aaron_Hart · October 24, 2014, 2:15pm

Just the evaluation, but there is no reason not to take the model too. Just add a Model to Cell node, and a Column Appender to join the model to it's scoring results.

Ergonomist · October 24, 2014, 2:50pm

Hi Viola,

As for the questions about my version of the workaround:

Extracting variables is just to keep the sync between tem dir creation and the Xval start node. Aaron does something similar with "extract table dimensions" inside the metanode.
The "Java Edit Variable" node only numbers the molde files kept on disk with the following one-liner:
return $${Stemp_path}$$ + "\\" + $${IcurrentIteration}$$ + ".model";
Finally, I noticed that I didn't document my metanode - expanded below for clarity.

I dislike having to use the file system in my workaround, but I like that it's minimalistic on the code side. Thorsten, making this a feature request is a great idea, thanks! :-)

Cheers,
E

violaw · October 24, 2014, 2:53pm

Hi, thanks for the comments! I now have a working version, which I attached below :-)

finalworkflow.png