Grouping dates by periods

Hi all,

I hope you are doing great! :sunny:

I am currently facing an issue with grouping dates by periods. In the attached example, columns A and B contain raw data, which can vary each time, while columns E, F, G, and H display the desired result. I have tried using the GroupBy function with the Worker column, applying First and Last, but this approach misses the gaps in between. Even if the Name and Surname are the same, any gap between the dates should result in two separate rows for the same Worker, as shown in the example.

I have also tried using the Lag function, but it lags into another name and surname, which disrupts my data accuracy. I am hopeful that KNIME can address this issue and would greatly appreciate any assistance you can provide.

Thanks so much! :pray:

Hi @businesssun , and welcome to the KNIME Community.

Would you be to attach that sample data as a file so that people can use it to try out ideas. Generally you will get a quicker response on the forum if you can upload actual sample data in addition to a screenshot. :wink:

2 Likes

Sorry, it is my first time in this forum :see_no_evil:

Example1.xlsx (9.3 KB)

1 Like

Hi @businesssun

Welcome to KNIME Forum.
See this wf grouping_dates.knwf (55.6 KB)

It processes every worker in a GroupLoop. With the (first) Date&Time Difference node in combination with the Rule Engine the “breakpoints” (gaps in the dates) are identified. The you are able to do a GroupBy.

gr. Hans

2 Likes

Hi @businesssun , that’s not a problem, and welcome once again. Thanks for uploading the data file.

Attached is a workflow that gives one approach for what you request. You were on the right lines with using lag columns, but there are some additional tricks. Firstly you need to lag both the Time Off Date and the Worker name. [edit: see my additional post below. It appears you don’t need to lag the Time Off Date since Date&Time Difference can be configured to handle it without! :-)]

You can then calc the date difference between each row, and treat only those rows were the date difference is 1 and the worker name is unchanged as part of the same grouping.

There is a “trick” that can then be applied (which I haven’t had to use in a while :wink: ) where you can use Rule Engine to mark the first row in each “grouping” with a 1, and leave all other rows either “missing” or mark with a zero. You can then cumulatively sum these 1s with the Moving Aggregator, and this has the effect of marking each row in the groups with a unique number for the group.

After that, you can use that “group number” along with the worker name to return min and max dates for the group, and perform the calculations to give the “units”

Calculate Date groupings for workers.knwf (99.0 KB)

3 Likes

@businesssun … lol, one hour on the forum, and already two solutions :wink:

@HansS , why did I never notice that the Date&Time Difference node has an option to compare with previous row? :rofl: Nice… today I learned! So I can remove one of my lag columns, and modify the config on my Date&Time Difference node!

1 Like

@takbb , To be honest, until today I also did not notice the option to compare with the previous row :slight_smile:

gr. Hans

1 Like

@HansS @takbb I see this was a win-win situation for both of you :partying_face:

Also a huge thanks for you two!! It worked, everything is amazing. Thanks again! :robot:

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.