Odd (buggy) behavior of Auto-Binner node

Hi guys,

I'm experiencing a very odd problem using the Auto-Binner node. Sometimes (still I didn't understand exactly when and why) the Auto-Binner refuse to bin the data in a desired equal-width bins, also if this is possible.

In the attached workflow you can see that using the same data set the Auto-Binner node in one case does not produces a desired number of equal-width bins, while in another case (exporting the data set and re-importing it), it does.

Does anybody have an idea why this happens? Can you confirm this is a BUG?

Thanks for any help.

Gio

Hi Gio,

i can confirm this strange behaviuour of the auto-binner node, as i posted it yesterday...

https://tech.knime.org/forum/knime-general/auto-binning-issue-with-row-filter-knime-31

Now let's wait for a possible solution / fix

regards

Ema

OK, I'm sorry, I didn't see your post.

Yes, it's very strange to obtain different behaviour with the same data set. Let's hope KNIMErs can come up soon with a solution.

Thanks to answer,

Gio

The behaviour is correct. If you read the Auto Binner's node description it tells you that bins are determined "over the domain range". The domain (e.g. min and max values) of a dataset are determined when reading it from an external source. Filtering rows will not change the domain, even if the table is completely empty in the end (you can inspect the domain in the second tab of any table output view). If write your data to a file and then read it back in again, the domain is adjusted to the actual data and therefore is different from the other domain. This leads to different binning results. If you want to re-calculate the domain values based on the actual data use the Domain Calculator node.

1 Like

Hello Thor,

Thank you so much for your quick and detailed answer. I'm sorry, I was not aware about the fact that the effective domain range (that in the second tab of any table output view) can be unlinked to the real domain of a certain table (meaning the current domain of a table, determined only by the entries contained in that table). In my mind I was thinking the domain was re-calculated by each node. Now that you explained me this behavior everything is clear and reasonable.

Effectively, as you suggested, if you use a Domain Calculator node just before the Auto Binner one, the correct desired number of bins is generated.

This confirm me you have a great team and a solid product. Congrats!

Gio

P.S.: I hope this forum thread can be useful also to other users.

1 Like