I have a set of data that I need to split many, many times … My data set is approx. 372 million rows of transaction data for products and sub-products that need to be split into individual sets.
At the moment I have had to create a “splitter” tree with about 35 rule-based row splitters and I will need another 10 or 20 more. Just keeping track of what is going where and trying to balance the data flow through the tree is a nightmare. Ideally, I would like a single rule-based splitter that splits everything once, or maybe twice (product then sub-product).
Is there a more efficient way to do this disaggregation of the data.