Feature request: parallelized cross validation metanode

Dear KNIME people,

Cross validation metanode is a great tool both for model assessment and model selection purposes. All the options it provides (e.g. linear, random & stratified sampling) are really well designed for its purpose. The only downside I noticed is that it currently does not run in parallel. This causes that the cross validation execution of most computationally intensive learner models is quite slow.

As the cross validation procedure mainly deals with independent model learner and prediction activities it would be simply great to have this parallelization (manually settable or automatically dependent on system processor count) implemented inside the cross validation nodes. Do you think it would be possible?

Dear all,

I agree! A such parallelization would greatly improve the cross-validation velocity.

Hello gcincilla,

Thank you for your suggestion! I have created a feature request and will post updates in this thread.


That would be great.

Thanks Ferry!


how about using parallel chunks enclosing that part of the workflow?

Docminus, that seems not to be feasible in my opinion. In facts what the current parallel chunk node does is to linearly split the workflow data in several chunks and run those in parallel. In a cross validation (CV) procedure the input data is already useful split into n training and n test sets. The parallelization should use the CV split and not create a new one. Makes this sense or I am misunderstanding something?

Good news!!