I’m grouping by a double column. The following 2 rows are not aggregated into the same group.
1,
1,
My first thought was a floating point situation where one of the floats is not 1, but 0.999999999 or so. However, that’s not the case. When I copy the values from the table view, I get 1 and 1.0 respectively. Obviously, they should be grouped.
I managed to create a tiny example to reproduce the bug. The cause is the rule engine, which inserts 1 into the double column, which subsequently is considered to be different from 1.0.
I took a look at your workflow. Glad to see you’ve found & shared your own workaround on this.
Just a tiny comment, it’s best practice to avoid Rule Engine when dealing with double types. The Expression Node or the Column Expression Node is more suitable for if-else statement applications on doubles.
I just had a look at the rule engine rule you input.
When I change the => 1 into a => 1.0 there is not any issue with the grouping. I guess there is a distinction between 1 (interpreted as an int) and 1.0 (interpreted as float) and knime does not seem to implicitly cast the int into a double, thus the 2 categories in the GroupBy.
It is also common in other languages to specify the 0 decimal part as a .0 to avoid any type misuse.
I agree with your analysis, Florent, but I do not agree that this is an acceptable behavior. It is a float column. Therefore, 1 and 1.0 must be equal. This is especially true for a tool with a broad target audience (including not-so-techy persons).
Thank you! I will try these next time.
I feel that the rule engine should be patched to either fix the issue or throw an exception.
What is this nonsense with not answering too quickly? I wanted to reply to 2 comments.
I think the problem is more complex than that. 1.0 stored as a float is totally different than 1 as an int and the “accurate” conversion between integers and floats is technically not possible (see IEEE 754 for more headaches).
In the rule engine, knime is trying to guess the output type, we can see that 1 is an int and 1.0 is a double (or a float). But when outputing the result in your column, knime is doing something (or not doing anything) that affects the way 1 and 1.0 are not the same in the computer memory.
In general, it is not really recommended to compare raw float number (because of the way they are stored in the memory).