Feature Suggestion: Flexible max unique values in GroupBy and Pivoting nodes

Hi there,

I have some workflows that used to work just nicely, but recently data input has been massively scaling up and my GroupBy nodes were first to notice due to the default 10.000 maximum unique values per group. I remember a workshop last year where I’ve asked for a workaround and was told to just write in a high number, but that suggestion impairs performance when there are a lot less unique values and doesn’t scale when there are more.

My solution now is using an Extract Table Dimension node in parallel and feeding the row count as variable to fill in as the maximum unique values. I’m sure I’m not the only one who was distressed by GroupBy and Pivoting nodes failing to operate when scaling up, so maybe it’s a good idea to put “use number of rows as max unique values” as a feature into the GroupBy and Pivoting nodes? In the worst case it might take a lot longer than it should, but even that is way less disruptive than noticing after a weekend that the workflow’s output is useless because the very first GroupBy node failed to count properly.

Thoughts?

Is it possible that in your case better load your data to DB like SQLite and do grouping on the DB level?

Oh, I wasn’t asking for help, it was just an idea I wanted to share. But thanks anyway. Performance isn’t an issue for the aforementioned cases and the big stuff is already done on the DB level, but it’s true that I could utilize SQLite for medium-heavy stuff more often, so I do appreciate your suggestion :slight_smile:

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.