Summarising on Role to get the start date and end date.

Hi Everyone,

I am new to KNIME. I trying to build a workflow around evaluating the delay in payment of invoices. The Raw data depicts the way the data is in the log file. I would like to grab the first start date associated with the Role and end date associated with the role. For instance, for requisition the start date and time would be 2024-01-01T05:14:02 and end date will be 2024-02-07T20:15:34 showing that the invoice was sitting with requisitioner for 38 days. Then, it went to AP Processor with start date of 2024-02-08T10:14:05 and stayed with him for 2024-03-09T07:21:46 another 28 days. so that I can add them at the end and figure out for how long invoice was sitting with a particular role.

Raw Log

Invoice Number Role Start Date & Time End Date & Time

123 Requisitioner 2024-01-01T05:14:02 2024-01-01T05:14:02

123 Requisitioner 2024-02-06T20:15:30 2024-02-06T20:15:31

123 Requisitioner 2024-02-07T20:15:34 2024-02-07T20:15:34

123 AP Processer 2024-02-08T10:14:05 2024-02-08T10:14:21

123 AP Processer 2024-03-09T07:21:39 2024-03-09T07:21:46

123 Requisitioner 2024-03-25T09:35:19 2024-03-25T09:35:29

123 Requisitioner 2024-04-09T22:22:40 2024-04-09T22:22:43

123 Requisitioner 2024-04-10T07:38:14 2024-04-10T07:38:14

123 AP Processer 2024-05-10T08:23:44 2024-05-10T08:23:46

123 AP Processer 2024-05-10T14:39:12 2024-05-10T14:39:15

Expected output

123 Requisitioner 2024-01-01T05:14:02 2024-02-07T20:15:34

123 AP Processer 2024-02-08T10:14:05 2024-03-09T07:21:46

123 Requisitioner 2024-03-25T09:35:19 2024-04-10T07:38:14

123 AP Processer 2024-05-10T08:23:44 2024-05-10T14:39:15

Appreciated your assistance with the query.

Hey there and welcome to the forum,

thanks for providing some sample data - next time I’m sure everyone would appreciate to get that in some sort of file format to avoid having to create it in order to help you out :-).

I’ve build a prototype here:

InvoicingCase.knwf (86.7 KB)

Overview

The challenge was to identify the different “process steps” that you can use to group - I used a bit of a work around to avoid having to use recursion. Using column expressions we create a new column that checks if the role in the current row is the same as in the previous row. If that is not the case the column shows the current row index to indicate that a new role is working on the invoice. For the first row we initialise with 1.
Then we use Missing Value node to fill in the missing values (where role and role-1 are equal) with the value of the previous row.

After that you can group by Invoice, Role and that new column and find Min of Start Date and Max of End date…

If you have multiple invoices you have to nest this prototype inside a group loop to pass the data for each invoice through separately.

2 Likes

Thanks Martin,

Appreciate your help :blush:. I will try to recreate what you have done on my end. For some reason I was not able to upload a spreadsheet. Maybe I need to spend more time on the forum to figure thing out. Due to time constraints, I just created data in excel and when I pasted it, it came this way.

Hi Martin,

I implemented the logic, and it works fine. Many thanks to you. As suggested, I will try to provide you the sample file next time. Just FYI, I have over 390,000 lines in the log file. My computer is currently struggling to run through the loop to fetch the final result :slight_smile: .

Glad it worked out. If there are that many rows in your data than any loop will take a while.

One idea that could be faster: combine columns Invoice Number & Role and then apply the logic to that…

see “bottom” workflow in this prototype:

InvoicingCase.knwf (102.9 KB)

For some reason I was not able to upload a spreadsheet.

Not a problem - it is probably easier than you think:) :You can just drag and drop into the window where you are typing your message or alternatively click on this button and browse to the file:

image

2 Likes

Thanks, The second option was way quicker. :slight_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.