Determine no. of days by comparing current row's value to the next row's value

Hi there,

I’m new to Knime and I was wondering if someone can give me some hints on how to achieve the below:

image

So I have the ID column which can have multiple records of the same ID, representing the statuses that it’s been through. I am trying to measure the time spent by each ID in each status (number of days).

There are a couple of conditions:

  • For each occurrence of an ID, determine the no. of days by calculating the date difference using next row: Value[Index + 1] - Value[Index].
    For example 1st occurrence of ID 1111: [Date Entered (row 4)] - [Date Entered (row 3)]

  • If it’s only 1 ID record (e.g. row 2) or it’s the last occurrence of an ID (e.g. row 5), check if Cleared Date exists and calculate [Cleared Date] - [Date Entered]. If the Cleared Date is missing, use the current date for calculation instead.

This is part of a longer analysis which I’ve made in Python… and now I’m required to convert it all to a Knime workflow so I just got stuck on this part.

Programmatically it seems easy, but I can’t wrap my head around Knime nodes to achieve this, so I’d be thankful for any suggestions.

Welcome to the forum @Jugger .

This is an interesting problem. Since you already did this in Python, you could try using a Python script node for the implementation.

I threw together a quick Python-free workflow that reproduces the data in the Output column:

image

You can download it from here: Calculate Number of Days by comparing successive rows – KNIME Hub

There are probably more elegant ways to do this, but here’s what I did step by step:

  1. Use a Date&Time Input node to generate the current date as a flow variable
  2. Read in data
  3. Convert Dates to Date&Time format
  4. Replace any empty cells in the Cleared Date column with the current date, make sure that the resulting column is in Date&Time format
  5. Start a Group loop, grouping on ID
  6. Invert the order of the data, which allows for use of the previous row for calculating the difference between the values in the Date Entered using a Date&Time difference node. I couldn’t find a way to access a the next row for the calculation, so this was a workaround
  7. If there’s only a single row, or if it’s the first row (which was originally the last row before the inversion), then the result of the Date difference calculation will be a missing value. I use another Date&Time difference node that calculates Cleared Date - Date Entered for each row
  8. Merge the columns from the previous 2 steps, giving the results of Step 6 priority.
  9. Clean up any negative signs
  10. End the loop, concatenate all the results, and re-sort
3 Likes

Hi @elsamuel

I’ve tested the workflow now and it works wonderfully. Thank you very much for your time and effort.

Regarding point 6,

I was digging meanwhile and I’ve seen that it’s doable with a Java snippet node, but I’m required to avoid scripting in this (no python scripting nodes, no nothing ugh…) so your approach is clever, I like it. :slight_smile:

As for elegance, I couldn’t care less haha the way you did it is more than fine, I’m frustrated enough by the requirements on this, so elegance is my last problem right now.

Thanks again, saved me some headache!

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.