Feature request: parallelized cross validation metanode

gcincilla · October 29, 2014, 8:24pm

Dear KNIME people,

Cross validation metanode is a great tool both for model assessment and model selection purposes. All the options it provides (e.g. linear, random & stratified sampling) are really well designed for its purpose. The only downside I noticed is that it currently does not run in parallel. This causes that the cross validation execution of most computationally intensive learner models is quite slow.

As the cross validation procedure mainly deals with independent model learner and prediction activities it would be simply great to have this parallelization (manually settable or automatically dependent on system processor count) implemented inside the cross validation nodes. Do you think it would be possible?

Nico1990 · April 21, 2016, 4:02pm

Dear all,

I agree! A such parallelization would greatly improve the cross-validation velocity.

ferry.abt · April 24, 2016, 8:01pm

Hello gcincilla,

Thank you for your suggestion! I have created a feature request and will post updates in this thread.

Best,
Ferry

gcincilla · April 25, 2016, 8:53am

That would be great.

Thanks Ferry!

Ergonomist · April 28, 2016, 1:33pm

\o/

Docminus · May 2, 2016, 9:16am

how about using parallel chunks enclosing that part of the workflow?

gcincilla · May 5, 2016, 9:41am

Docminus, that seems not to be feasible in my opinion. In facts what the current parallel chunk node does is to linearly split the workflow data in several chunks and run those in parallel. In a cross validation (CV) procedure the input data is already useful split into n training and n test sets. The parallelization should use the CV split and not create a new one. Makes this sense or I am misunderstanding something?

Nico1990 · May 9, 2016, 2:04pm

Good news!!