GroupBy node - aggregate by type option does not follow the settings

gcincilla · May 23, 2019, 9:44am

Hi guys,

I think I found a potential bug in the GroupBy node. As you can see in the attached workflow I group the node for a certain string column and then I aggregate by type using the following settings:

string type: concatenate
integer type: sum
double type: min, max, mean and SD

Nevertheless the integer column present in the input table, a part from being aggregated using sum, it is also aggregated using min, max, mean and SD, methods that in theory are reserved only for columns with double type.

Do you know why this happens?

Thanks in advance,

Gio

groupby_node_type-based_aggregation_problem.knwf (14.3 KB)

Aswin · May 23, 2019, 10:26am

When you select “double” it includes all numbers, including ints; this is by design.

From the GroupBy help text:
"The “Type Based Aggregation” tab allows to select an aggregation method for all columns that are compatible with the selected data type. For example to apply an operation on all numeric columns simply select DoubleCell. This will include all numeric cells that are compatible with DoubleCell such as IntCell and LongCell. "

Edit: I think many people are surprised by this behavior. Feature request: instead of writing “Number (double)” in the Data Types field, write “Number (double, long, int)”

gcincilla · May 23, 2019, 10:45am

Hi Aswin,
Ups… sorry, I didn’t read that part. Thank you for the explanation!
Anyway yes… maybe writing “Number (double, long, int)” instead of “Number (double)” as you suggested, would be more exact and will be less prone to misunderstanding.
Best

ipazin · June 12, 2019, 1:52pm

Hi there,

This would be sort of ok proposition but there can be another data (number) type developed that is compatible with double (cause KNIME Analytics Platform is an open source software ) and thus this wouldn’t be true and thus can’t work

Anyways some solution for this would be welcome as this is reoccurring question…

Br,
Ivan

Aswin · July 30, 2019, 1:17pm

Hi everyone,

Today I noticed that in the new Knime 4.0 the GroupBy - Aggregate By Type configuration window has a “Type matching” drop down menu, where you can choose between “Strict” (default) and “Include sub-types” (the old behavior). Because “Strict” is now the default, it can lead to unexpected results if one does not notice this new feature and is still used to the old behavior (this is what happened to me). Nevertheless, I think this is a welcome change!

Best,
Aswin

ipazin · August 1, 2019, 11:46am

Hi Aswin,

yep. The default is changed and might confuse one is used to old behavior. That why this is covered in node description

At least it will no more lead to confusion when choosing Double and grouping all numeric columns

Br,
Ivan

system · January 30, 2020, 11:46pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.