Multi row calculation

4nak1n · May 5, 2024, 2:32pm

Assume I have this table:

Period	ID	Date Taken	Cat
1/02/2022	1	1/02/2022	H
1/02/2022	1	2/02/2022	H
1/02/2022	1	3/02/2022	H
1/02/2022	1	4/02/2022	P
1/02/2022	2	4/02/2022	P
1/02/2022	2	5/02/2022	S
1/03/2023	1	30/03/2023	S
1/03/2023	1	31/03/2023	H
1/04/2023	1	1/04/2023	H
1/04/2023	1	2/04/2023	H
1/04/2023	1	3/04/2023	H

I would like to count how many consecutive leave per id. When Leave Categories (Cat) are H or P I would count toward the consecutive leave, when S, it will not be considered.

‘Days’ column is for how many consecutive leaves are taken.
‘Mark’ column is for, if cat S happened 1 day before cat H or P, will be marked ‘Y’.

Desired output:

Period	ID	Date Taken	Cat	Days	Mark
1/02/2022	1	1/02/2022	H	1
1/02/2022	1	2/02/2022	H	2
1/02/2022	1	3/02/2022	H	3
1/02/2022	1	4/02/2022	P	4
1/02/2022	2	4/02/2022	P	1
1/02/2022	2	5/02/2022	S	0	Y
1/03/2023	1	30/03/2023	S	0	Y
1/03/2023	1	31/03/2023	H	1
1/04/2023	1	1/04/2023	H	2
1/04/2023	1	2/04/2023	H	3
1/04/2023	1	3/04/2023	H	4

As you can see, from 1 Feb to 4 Feb, ID 1 took 4 days (Cat are H or P), so ‘days’ range from 1 to 4. ID 2 took 1 on the 4 of Feb, and S on the 5 of Feb, so he has 1 and 0 (S won’t be counted toward ‘days’). But on the period Mar 2023, ID 1 took leave (cat C) on the 30th Mar and leave (cat H) from 31 to 3 of Apr. I would like to count only from the 31 to the 4 together, skip the 30 of Mar as it was cat S.

And the column ‘Mark’ the logic will be, that if cat S happened 1 day right before or after cat P or H will be marked ‘Y’. You can see ID 2 has 1 cat S right after the 4th of Feb, and ID 1 has 1 cat S right before cat H on the 30 of Mar

Note:
H - Holiday
P - P. Holiday
S - Sick

gonhaddock · May 6, 2024, 8:45am

Hello @4nak1n
This $Days$ column is a product, that you can achieve with ‘Rank’ node; by using ‘Grouping Attributes’ configuration + Ordinal ‘Ranking Mode’ and Retaining Row Order.

The zero values for S Cat, an Marks; they can be built with the help of ‘Rule Engine’ nodes.

BR

4nak1n · May 6, 2024, 9:13am

Hi, thank you for your pointer. I am coming from Alteryx, and so overwhelmed by how many tools Knime is offering.

rfeigel · May 7, 2024, 1:23am

Could you please share how would configure the Rank node?

gonhaddock · May 7, 2024, 8:56am

Hello @rfeigel
It’s already described in the post:

BR

rfeigel · May 7, 2024, 2:00pm

I’ve tried practically every possible configuration and can’t produce the desired outcome directly from the input table.

gonhaddock · May 8, 2024, 10:00am

Hello @rfeigel
I gave just an answer solution without testing it. Now, I’m trying to address your question and I think that you are right.

The Rank Node would be valid only for sample data, however it will fail on change $Period$ consecutive $Date Taken$. Furthermore, it will further fail for a worker with more than one group of consecutive $Date Taken$ in the same period.

Said so. The logical solution would be more complex… requiring a complex conditional workflow, based in ‘Column Expressions’ with Offset function. Something similar to the discussion in the following post:

I can work out a custom solution workflow, I’ll try to deliver along the day (I can’t promise it). You can also share, if you have a valid solution ready.

BR

gonhaddock · May 8, 2024, 2:39pm

Hello @4nak1n , @rfeigel
I’ve found a possible solution based in ‘Rank’ node. Then, no ‘Column Expressions’ coding is needed.

This solution generates ranged counts based in consecutive $Date Taken$, with the help of ‘Moving Aggregation’ node. It doesn’t look up to $Period$ anymore.

20240508_multi_row_calculation_v0.knwf (65.6 KB)

Any comments are welcome.

BR

system · May 15, 2024, 2:39pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.