Multi row calculation

Assume I have this table:

Period ID Date Taken Cat
1/02/2022 1 1/02/2022 H
1/02/2022 1 2/02/2022 H
1/02/2022 1 3/02/2022 H
1/02/2022 1 4/02/2022 P
1/02/2022 2 4/02/2022 P
1/02/2022 2 5/02/2022 S
1/03/2023 1 30/03/2023 S
1/03/2023 1 31/03/2023 H
1/04/2023 1 1/04/2023 H
1/04/2023 1 2/04/2023 H
1/04/2023 1 3/04/2023 H

I would like to count how many consecutive leave per id. When Leave Categories (Cat) are H or P I would count toward the consecutive leave, when S, it will not be considered.

‘Days’ column is for how many consecutive leaves are taken.
‘Mark’ column is for, if cat S happened 1 day before cat H or P, will be marked ‘Y’.

Desired output:

Period ID Date Taken Cat Days Mark
1/02/2022 1 1/02/2022 H 1
1/02/2022 1 2/02/2022 H 2
1/02/2022 1 3/02/2022 H 3
1/02/2022 1 4/02/2022 P 4
1/02/2022 2 4/02/2022 P 1
1/02/2022 2 5/02/2022 S 0 Y
1/03/2023 1 30/03/2023 S 0 Y
1/03/2023 1 31/03/2023 H 1
1/04/2023 1 1/04/2023 H 2
1/04/2023 1 2/04/2023 H 3
1/04/2023 1 3/04/2023 H 4

As you can see, from 1 Feb to 4 Feb, ID 1 took 4 days (Cat are H or P), so ‘days’ range from 1 to 4. ID 2 took 1 on the 4 of Feb, and S on the 5 of Feb, so he has 1 and 0 (S won’t be counted toward ‘days’). But on the period Mar 2023, ID 1 took leave (cat C) on the 30th Mar and leave (cat H) from 31 to 3 of Apr. I would like to count only from the 31 to the 4 together, skip the 30 of Mar as it was cat S.

And the column ‘Mark’ the logic will be, that if cat S happened 1 day right before or after cat P or H will be marked ‘Y’. You can see ID 2 has 1 cat S right after the 4th of Feb, and ID 1 has 1 cat S right before cat H on the 30 of Mar

Note:
H - Holiday
P - P. Holiday
S - Sick

Hello @4nak1n
This $Days$ column is a product, that you can achieve with ‘Rank’ node; by using ‘Grouping Attributes’ configuration + Ordinal ‘Ranking Mode’ and Retaining Row Order.

The zero values for S Cat, an Marks; they can be built with the help of ‘Rule Engine’ nodes.

BR

4 Likes

Hi, thank you for your pointer. I am coming from Alteryx, and so overwhelmed by how many tools Knime is offering.

3 Likes

Could you please share how would configure the Rank node?

1 Like

Hello @rfeigel
It’s already described in the post:

BR

I’ve tried practically every possible configuration and can’t produce the desired outcome directly from the input table.

Hello @rfeigel
I gave just an answer solution without testing it. Now, I’m trying to address your question and I think that you are right.

The Rank Node would be valid only for sample data, however it will fail on change $Period$ consecutive $Date Taken$. Furthermore, it will further fail for a worker with more than one group of consecutive $Date Taken$ in the same period.

Said so. The logical solution would be more complex… requiring a complex conditional workflow, based in ‘Column Expressions’ with Offset function. Something similar to the discussion in the following post:

I can work out a custom solution workflow, I’ll try to deliver along the day (I can’t promise it). You can also share, if you have a valid solution ready.

BR

1 Like

Hello @4nak1n , @rfeigel
I’ve found a possible solution based in ‘Rank’ node. Then, no ‘Column Expressions’ coding is needed.

This solution generates ranged counts based in consecutive $Date Taken$, with the help of ‘Moving Aggregation’ node. It doesn’t look up to $Period$ anymore.

20240508_multi_row_calculation_v0.knwf (65.6 KB)

Any comments are welcome.

BR

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.