GroupBy Pattern Based Aggregation

Hello Everyone,
I want to use the GroupBy node with two different aggregations methods. Every column starting with a # should be aggregated to a sum, the columns not starting with a # should be aggregated to the mean.
I am able to do the first task by just typing #* to sum all elements. But I do not know how to make the not condition.
I also tried doing it with Regex, but must admit that this looks complicated for me. I never used this so far.

I am happy about every support :slight_smile:
Greetings Alex

Hello @schalex

Logical Indexing column True/False with ‘Rule Engine’ node => $LOGIC$ :

$reference_column$ LIKE "#*" => TRUE
TRUE => FALSE

Then group by $LOGIC$
BR

1 Like

I’m afraid this won’t do it since @schalex wants to group based on the column characteristics and the Rule Engine is not capable of evaluating column headers directly.

Given

image

I would use pattern based aggregation on (^#.*) (start with hash) and ^(?!#).* (start without hash.)

6 Likes

Hi @schalex , the most efficient way is what @ArjenEX has proposed.

However, if you or those who are not comfortable working with Regex, here’s an alternative solution.

You can just do a Column Split to separate the columns into 2 groups, 1 with # and the other without. Then each group can do their own groupby (sum on 1, mean on the other), and then use Column Appender to append the columns together.

Something like this:
image

I purposely used the same input as @ArjenEX so we can compare the results.
Input:
image

Results:
image

As you can see, I get the same results.

Here’s the workflow: GroupBy Pattern Based Aggregation.knwf (12.0 KB)

2 Likes

Dear @ArjenEX, @gonhaddock & @bruno29a,
Thank you very much for your suggestions and solutions.
The Regex example of @ArjenEX was the most applicable for me, because I did not need additional blocks and it was very convenient.
I appreciate all of your suggestions, that were really helpful and supporting.

I wish you a nice rest of the day. Kind regards, Alex

1 Like

Hi @schalex ,

I would not choose something to be most applicable simply because it uses less “blocks” (nodes). :slight_smile:
I would choose something that’s most applicable simply if it’s most applicable to start with, and if the solution is the most efficient one.

In some cases, using less nodes could actually not be better, so I’m just making sure that in the future, you do not necessarily think that using less nodes is automatically better. :slight_smile:

But in this case, as I have already mentioned, what @ArjenEX has proposed is the most efficient way. :smiley:

As it is with Knime, there are always different ways to achieve what we need, and what I’m suggesting is an alternative that does not use regex, because while you “did not need additional blocks”, you needed proper knowledge of regex, which you did not seem to have:

So, that example will be for your next challenge where you won’t necessarily need to be dependent on regex, or where you’ll need a quick answer to your regex question but it is not always that you might get someone available to help in time with regex - though it would be better to learn regex in the end.

Cheers, and happy Knime’ing :smiley:

2 Likes

Good evening @bruno29a,
This is a very valid point. Just built a new workflow with the similar challenge and wanted to check again the regex from @ArjenEX. I agree to your statement and just learned right now how true it is :slight_smile:

Thank you very much for this many learnings and great support in this community! I really like it a lot!

I wish you a nice rest of the day. Kind regards, Alex

2 Likes

No problem @schalex , happy to help, and I’m glad that you appreciate things from a different point of view :smiley:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.