Gruop By Mean does not match with Mean in Excel

Hello,
I am trying to migrate an excel base to knime using famous nodes (Joiner, Group By, Pivoting) and it is working. But when I downloaded the base from Knime to check the values, I figured out that the Mean calculated using Group By does not match with the simple mean from the Excel. See example attached:

In Excel the mean is zero. But using Group By from Knime the mean is 2,54426109809932E-17.

I believe it is a format issue, but I don’t know how to configure at knime. I need the same number from excel.
Could you help me please?

Thanks
KNIME MEAN.xlsx (11.2 KB)

Hi @felipemelnik ,

When I take the individual values from your excel and use a groupby node to perform (mean) in KNIME I get the result 0.0

However, I am of course having to transpose the data your xlsx into a form that makes it suitable for the groupby node to operate, and therefore I am obviously not performing this in the same way that you are. As I cannot see your original data, nor the specifics of what you are doing, I cannot know what is happening in your case.

My suspicion is that the original numbers that you believe to be 0 that you are passing to KNIME are not actually exactly zero but simply appear to be when you look at them in Excel.

It would be better if you could share a demonstration workflow that shows the actual data in the form that you are giving to KNIME’s groupby node that causes it to return the result you are seeing. thanks.

3 Likes

Hello @felipemelnik,

For some reason the data wasn’t showing as it is in excel file

So you should select following option in Excel Reader node
By Position 1, Which should give proper data

Following is the workflow attached
Calculate mean.knwf (76.4 KB)

Regards,
Yogesh

Hi, I can not share the database because there is confidential data. But I will share prints of the workflow to explain better:


image

When I see the output of the Group By inside Knime, the value is zero:
image

But when I save in the excel, the value is different:
image

Should I use the node Round Double?

Thanjks,

Hi @felipemelnik

What do you see when you right-click on the ColumnValues column and go for Available Renderes → Full Precision?

gr. Hans

1 Like

It was selected Standard Double, using Full Precision the value is :
image

But the mean should be zero !

I belive it is a configuration of the Mean using the Gropu By node, because when I calculate the Sum using the the dataset the results is correct (zero).

Hi @felipemelnik,

If I apply the group by node, and calculate mean using just the numbers, taken at face value, that we can see in your screenshot it will return zero and I am confident it will return zero if anybody else tries it. :wink:

Whilst I appreciate that you have confidential data that you cannot share, we will need more to go on, or we can only guess.

I have one guess though, thanks to @HansS 's suggestion…

My suspicion is that you either have other data for “NOVO SKU” 5444d345 that is not showing in your screenshot, or else not all of the zeroes in your data are actually zero!

Can you perhaps share a screenshot of what this input data looks like if you apply @HansS 's suggestion of displaying using the full precision renderer to it…

image

Or alternatively output just the data for rows Row7620 to Row7629 in your screenshot to Excel, using the Excel Writer (not copy/paste) and upload that. I think the “error” will already be present before the GroupBy node.

i.e. My suspicion is that at least one of those zeroes isn’t actually a zero, and that the error is not the GroupBy node at all, but in fact occurs earlier… probably in your source data since I do not see how any of the nodes in your screenshot prior to GroupBy could introduce these rounding errors.

btw, which version of KNIME are you using?

another question, if you open preferences, what is your “preferred renderer” for Double values?

1 Like

Hi @takbb

I have check the the data using the full precision data and there is not “weirds zeros”:
image

See attached the output in excel as well.

OUTPUT felipe.xlsx (8.7 KB)

My Knime version is 4.7.7 and my preferred renderer is:
image

See below the steps that came before the Group By Node:

I did the same workflow using a smaller data base and the mean was zero. I believe is some format issue but I can not figure out.

1 Like

I think this is just floating point math at work.

Computers don’t do math in decimal, they do math in binary. Most numbers that have digits after the decimal point can’t be represented exactly in binary, and so there’s some error introduced during each operation - every number is going to be slightly off from what you intended. String enough of these math operations together and the error starts accumulating.

2 Likes

Hi @elsamuel I think is exactly this what is happening. The issue happens after a Pivot Node. Is there any recommendation to resolve this? Maybe use Round Double Node? Thanks

I have just tried using “Process in Memory” and now it is working.

3 Likes