I think I found a potential bug in the GroupBy node. As you can see in the attached workflow I group the node for a certain string column and then I aggregate by type using the following settings:
- string type: concatenate
- integer type: sum
- double type: min, max, mean and SD
Nevertheless the integer column present in the input table, a part from being aggregated using sum, it is also aggregated using min, max, mean and SD, methods that in theory are reserved only for columns with double type.
Do you know why this happens?
Thanks in advance,
groupby_node_type-based_aggregation_problem.knwf (14.3 KB)
When you select “double” it includes all numbers, including ints; this is by design.
From the GroupBy help text:
"The “Type Based Aggregation” tab allows to select an aggregation method for all columns that are compatible with the selected data type. For example to apply an operation on all numeric columns simply select DoubleCell. This will include all numeric cells that are compatible with DoubleCell such as IntCell and LongCell. "
Edit: I think many people are surprised by this behavior. Feature request: instead of writing “Number (double)” in the Data Types field, write “Number (double, long, int)”
Ups… sorry, I didn’t read that part. Thank you for the explanation!
Anyway yes… maybe writing “Number (double, long, int)” instead of “Number (double)” as you suggested, would be more exact and will be less prone to misunderstanding.
This would be sort of ok proposition but there can be another data (number) type developed that is compatible with double (cause KNIME Analytics Platform is an open source software ) and thus this wouldn’t be true and thus can’t work
Anyways some solution for this would be welcome as this is reoccurring question…
Today I noticed that in the new Knime 4.0 the GroupBy - Aggregate By Type configuration window has a “Type matching” drop down menu, where you can choose between “Strict” (default) and “Include sub-types” (the old behavior). Because “Strict” is now the default, it can lead to unexpected results if one does not notice this new feature and is still used to the old behavior (this is what happened to me). Nevertheless, I think this is a welcome change!
yep. The default is changed and might confuse one is used to old behavior. That why this is covered in node description
At least it will no more lead to confusion when choosing Double and grouping all numeric columns
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.