I am using a Groupby Node on a large data set to calculate median across different dimension combinations. I have noticed that on a large data set (~20M rows) the median aggregation is returned as Null, despite raw values in the table.
Are there some kind of calculation limits on the Groupby node?
Details: ~20M rows, metric is in Long format. If metric is converted to a Double, Median can be calculated. The statistics will calculate Median when the number is in Long format. able to calculate
Hi @San_Diego_Web_Analyst ,
Welcome to the knime community forum!
This could be due to different reasons.
Do you see any error or warning message on the console when you execute the node?
For example, If there is a warning message about Maximum unique values per group as shown in the screenshot below, you can increase the corresponding parameter in the node configuration.
No, before executing I had already increased the number of unique values allows to be greater than the number of rows in the dataset. And again, the same settings work on the same extact data and calculate a median if I change the number format from long to double. Is this a limitation with long number formats?
There is no long number format specific limitation I am aware of. It could be a bug.
Are you able to share the workflow with the data? That would make the effort to reproduce easier.
I tried reproducing your issue with a toy dataset to no avail. For me the Group By node works just fine in getting a median of long values even with 20M rows.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.