Hello!
Does anyone have advice on removing groups that do not meet a certain criteria in a single loop? (I.e. total rows of the group are < 5)
I have a file that has observations of store/product/price/sales. I need to group by store/product in order to calculate price elasticity (via linear regression). However, in order to complete my linear regression, I need to remove the store/product group if they do not have enough observations to calculate an accurate coefficient.
The only way I can think to do this is to have 2 Group Loop Starts. The first Group Loop Start will group by Store/Product, Rule Engine will Count Rows, [loop end], then Row Filter will filter those that do not meet minimum observations criteria. The second loop then groups again by Store/Product, and runs the regression… [loop end].
Example of my current iteration: grouping_example.knar (19.7 KB)
This iteration works well, however I’m just working with a proof of concept right now. When I scale this workflow up, I will have tens of thousands of stores, and a thousand products, with each product in each store having up to 250 observations. I’m a bit worried that the multiple group solution is not the most efficient, and wondering if there’s a better way.
Thanks!