Aggregation on distinct cell values (GroupBy)

Dear KNIMErs,

Does somebody know how I can aggregate on distinct cells using the GroupBy node (or in a different way)? The rows containing a particular cell value can only appear in the first group, after this, KNIME has to choose the first distinct cell, so the second one.

I have this dataset

Task - Date - Employee - combined string (date&employee)
1 - 4/1 - 12 - β€œ4/1, 12”
1 - 5/1 - 13 - β€œ5/1, 13”
1 - 6/1 - 14 - β€œ6/1, 14”

2 - 4/1 - 12 - β€œ4/1, 12”
2 - 5/1 - 13 - β€œ5/1, 13”
2 - 6/1 - 14 - β€œ6/1, 14”

Now I would like knime to choose the first available Employee, but when the combined string is already used it cannot be used again.

I used the GroupBy node to aggregate on the first combined string, but I got duplicates, of course.

I would be very happy if somebody could help me.

Thanks in advance!

Could you provide a little more information on what you want to achieve. I don’t really understand what you want to achieve. Maybe you could show us how the resulting data should look like.

1 Like

So the purpose is to group by task and pick the first most efficient option. But there is one restriction, an employee can only work once a day, so the combined string can only appear the first time.

I would like to group by a distinct cell value of the last column. I tried to count it and rank it, but this didn’t work.

Thanks for your help!

So the resulting data should be

1 - 4/1 - 12 - β€œ4/1, 12”
2 - 5/1 - 13 - β€œ5/1, 13” -> second choice because the cell β€œ4/1, 12” is already in use

I am not exactly sure what you want to do. I attached a workflow that:

  • aggregates the two main items Tasks and combination of date and employees
  • sorts the whole thing according to the needs
  • creates artificial IDs that reflects that setting
  • joins the two files together

So you have an artificial rank that does what you want. You have to be sure the combination of date and employee is unique so you might create that after your aggregation.

kn_example_agg_distinct_cell_values.knwf (42.4 KB)


Yes! That’s it! Thanks for your time sir, I were thinking all day about this miniscule problem. I didn’t see the power of combining two GroupBy nodes with a joiner.


Have a nice evening!

1 Like