Splitting out a data set for groups of rows but values have to net to zero

Hi @taylorpeter55 , thank you for reviewing the output, and giving valuable feedback. Always nice to see somebody actively involved rather than simply firing off questions and then seemingly walking away. :slight_smile:

With regards the journal splitting, this was down to the way the code works in processing “chunks” of journal lines in one go, and I was trying to find a trade-off between performance in terms of the number of records it could handle in one go, and the optimum filling of the pages. I was limiting the maximum size in any one subset, which made it less optimal in terms of filling pages.

The good news is that after some tweeking and discovering an area that I hadn’t optimised the way I thought I had, I can now set the “chunk size” the same as the “page size”. As a result, in a test run I have now done again, with the same input file, I get this result, which I think you will agree is much closer to the ideal of 900 per page:

image

I have uploaded this to the hub here, as I think it may be something useful to others, and also will allow me to revise it at a later date (and improve the documentation).

It’s not for the faint-hearted :rofl:… using recursion in a java snippet along with a recursive loop in KNIME

Please take a look. I’ve added annotations to the flow. Give it a try and please get back to me if you have any questions, or find any problems.

It was an interesting challenge. I hope it works for you… Enjoy :wink:

When I find time, I may turn a portion of this into a component as I can see it potentially being useful to a wider audience.

2 Likes