Why GroupBy and Pivoting nodes don't count zero?

Hi,
When I use count or percent aggregation on any data and the number of the instances in the group I’m using aggregation on is zero, GroupBy skips them and pivoting produces missing value instead of the number 0 (or skips if there is only one group). Is there any option to change the behavior of the nodes?

I’m not too sure whether I got you right, but is the checkbox for in- /excluding missing values in the aggregation functions the option you are looking for?

Cheers,
Marten

Hi Marten,
No. My problem is that when I use “count” or “percent” aggregations on a dataset in Pivoting node, the output contains missing value instead of zero (e.g. when using pivoting on a column which has 2 values, if the count in one value is zero for a group, it produces missing not zero). I expect to have zeros instead of missing values.
And in GroupBy node when count for the group is zero or in Pivoting node if the count for all pivoting columns is zero, the group will be skipped, I prefer to have the option for the behavior of the nodes to make them produce zero value if there is no records or skip the groups.

1 Like

@armingrudd did you figure out a solution to this problem? I am having the same issue.

Hi @SebM and welcome to the KNIME forum,

Following our conversation via Linkedin, please provide an example here.

I can reproduce the issue in Pivoting node where count is missing instead of zero when a category (pivoting column) does not exist in a group.

But I cannot remember what exactly was the situation in GroupBy node. I do remember there was an issue but cannot reproduce it. So your example can help me.

Thanks.

:blush:

Thank you @SebM for reminding me this issue.

I just remembered the issue and could reproduce the problem.

I noticed that the issue with counting 0 in GroupBy node has been fixed. But there is still another issue with the node.

And about the pivoting node, the problem is still there.

Here is an explanation of the problems:

GroupBy node:
If I’m not missing something, when I posted this topic, the groups with count zero were skipped (removed from the output) instead of having value “0” (using count method and unchecking the missing value option where all the values for the group are missing). It seems this problem is already fixed. But, in the same situation, when I want to calculate the percent (missing unchecked), another problem occurs. The total percent is not 100. Check the example workflow below.

Pivoting node:
When a category (pivoting value) is missing in a group, the count is missing instead of 0 and when all the pivoting values are missing in a group, the group is removed from the output (unchecking “Ignore missing values” option in the “Pivots” tab will bring the group back but adds a new category as missing). Check the example workflow.

groupby_pivoting_issue.knwf (22.4 KB)

:blush:
@ScottF @ipazin

Hi there @armingrudd, @SebM,

sry for a delay on this one. I have checked example and seems there can be some improvements regarding missing values handling.

Will come back to you. Tnx for engaging!

Br,
Ivan

1 Like

Hi @armingrudd Is there any way to replace ? with zero or null? When I export to html and add into email, it also keeps the ‘?’ which will confuse people why question mark. Do you know how to fix that? Thanks!

I found the answer in another post - using Missing Value node to replace with 0.

2 Likes

Possible alternative for more complex cases with additional clean-up needs: Rule Engine with rule

MISSING $colname$ => 0

Regards,
E

2 Likes