Create mass models without loops

Kaymar · September 10, 2015, 12:29pm

Hi guys,

I have a workflow that generates lots of linear regressions (one model per serie). Learns & Predicts.

I currently use a loop with a variable-controlled Row Filter to pass only the ith-serie data to the learner and collect the results.

The Row filter takes about 20 seconds to parse the initial dataset (4 million rows filtered down to ~3000) in each loop-step. The full process thus takes 20 hours.

Is there a clever way to generate one model per serie without pre-filtering the data before the learning node ?

Thanks for tips / direction,

Best regards,

Nicolas

swebb · September 10, 2015, 6:18pm

Could you use a group loop start and group on a column containing the series identifier?

The splitting of the data is handled by the loop start node and you wouldn't need another row filter. I assume this will be quicker.