a common use case is to run a model on chunks of data (e.g. one model by country / product group), and we often have a large number of such a chunks to handle. The easy implementation is with the Group Loops - wich splits the chunks and sequentially runs the model on each.
We have now tried using Parallel Chunk Loop to parallelize the execution.
That seems to work - but here are a few observations:
- RowIDs are not made unique in the Parallel Chunk Loop End:
I have branches where I add new rows to my dataset, and this results in rowIDs being used multiple times - making the Parallel Chunk Loop End node fail when combining the data. This can be fixed by using the RowID node and appending the chunk_index to the RowID. However - it would be nice if the Chunk Loop End node would be smart enough to do that by default...
- Parallelization of R executions: Is that working?
We often run models in R Snippets - and these can be quite time-consuming.
Using the Parallel Chunk Loop - I however do not get the expected benefits in this case. I would have expected that when running RSnippet nodes in the branches - I would see multiple RTerm / R processes getting launched. But I can only see one (on Windows) running on one core. Hence it feels as if parallelization is not working with the R integration. Is there a way to improve that?