Best Way To Removing Groups That Don't Meet Criteria?

Hello!
Does anyone have advice on removing groups that do not meet a certain criteria in a single loop? (I.e. total rows of the group are < 5)

I have a file that has observations of store/product/price/sales. I need to group by store/product in order to calculate price elasticity (via linear regression). However, in order to complete my linear regression, I need to remove the store/product group if they do not have enough observations to calculate an accurate coefficient.

The only way I can think to do this is to have 2 Group Loop Starts. The first Group Loop Start will group by Store/Product, Rule Engine will Count Rows, [loop end], then Row Filter will filter those that do not meet minimum observations criteria. The second loop then groups again by Store/Product, and runs the regression… [loop end].

Example of my current iteration: grouping_example.knar (19.7 KB)

This iteration works well, however I’m just working with a proof of concept right now. When I scale this workflow up, I will have tens of thousands of stores, and a thousand products, with each product in each store having up to 250 observations. I’m a bit worried that the multiple group solution is not the most efficient, and wondering if there’s a better way.

Thanks!

Hi there @Snowy,

if I got you instead of Rule Engine filter inside first loop. Something like this:

In Rule-based Row Filter this is syntax:
$$ROWCOUNT$$ < 5 => TRUE

with option exclude TRUE matches.

Hope this helps.

Br,
Ivan

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.