Using Data from Previous Row in Calculation (3.3.1)

Background: have a dataset with values taken at different times and am looking to standardise in order to compare with other datasets. This is work that is easy with Excel, but the data set is massive and I am wanting to use a more powerful to solve my problem.

Problem: I’m looking to interpolate values based on the data in the previous row unless a set value is already fixed (see example table below). I have shown the relevant Excel formula for clarity.

RowID Time/Date Value Gradient TimeDiff_sec InterpolatedValue
5 2019-04-25T00:04:17 51.5 N/A 17 =IF(ISNUMBER(C5),C5,F4+(E5*D5/60))
6 2019-04-25T00:05:00 -0.134085779 43 =IF(ISNUMBER(C6),C6,F5+(E6*D6/60))
7 2019-04-25T00:06:00 -0.134085779 60 =IF(ISNUMBER(C7),C7,F6+(E7*D7/60))
8 2019-04-25T00:07:00 -0.134085779 60 =IF(ISNUMBER(C8),C8,F7+(E8*D8/60))
9 2019-04-25T00:08:00 -0.134085779 60 =IF(ISNUMBER(C9),C9,F8+(E9*D9/60))
10 2019-04-25T00:09:00 -0.134085779 60 =IF(ISNUMBER(C10),C10,F9+(E10*D10/60))
11 2019-04-25T00:10:00 -0.134085779 60 =IF(ISNUMBER(C11),C11,F10+(E11*D11/60))
12 2019-04-25T00:11:00 -0.134085779 60 =IF(ISNUMBER(C12),C12,F11+(E12*D12/60))
13 2019-04-25T00:11:40 50.51 -0.134085779 40 =IF(ISNUMBER(C13),C13,F12+(E13*D13/60))
14 2019-04-25T00:12:00 -0.500900901 20 =IF(ISNUMBER(C14),C14,F13+(E14*D14/60))
15 2019-04-25T00:13:00 -0.500900901 60 =IF(ISNUMBER(C15),C15,F14+(E15*D15/60))
16 2019-04-25T00:14:00 -0.500900901 60 =IF(ISNUMBER(C16),C16,F15+(E16*D16/60))
17 2019-04-25T00:15:00 -0.500900901 60 =IF(ISNUMBER(C17),C17,F16+(E17*D17/60))
18 2019-04-25T00:16:00 -0.500900901 60 =IF(ISNUMBER(C18),C18,F17+(E18*D18/60))
19 2019-04-25T00:17:00 -0.500900901 60 =IF(ISNUMBER(C19),C19,F18+(E19*D19/60))
20 2019-04-25T00:17:13 47.73 -0.500900901 13 =IF(ISNUMBER(C20),C20,F19+(E20*D20/60))

Any advise would be appreciated :slight_smile:

Hi @Benjamin_B,
welcome to the KNIME Forum! KNIME has the Lag Column node, which is exactly what you need. Let me know if you need any help with it.
Kind regards
Alexander

1 Like

Thanks for the quick reply! I don’t understand how the lag column can help me. The “InterpolatedValue” should be the column where this is calculated and the subsequent row should be calculated using the value in the cell above (i.e. same column). The Lag Column only allows me to copy and offset an existing column. In terms of implementation, I see this as only plausible if n-columns are generated for n-rows of data. This would generate a file that again becomes too large (I’m working with 500k+ rows of data).

An alternative calculation method would be a fill down of the “Value” column and a sum of “TimeDiff” since the last time at which a value was recorded in the “Value” column. My gut-feeling though is that this would be at least as computationally expensive as the previous suggestion.

1 Like

Hi,
of course after the lag column you would need another node, e.g. a Math Formula, to make the actual computation. I don’t think it is possible to do it in a single step.
Kind regards
Alexander

Is it possible then to break the data up into rule-based blocks?

e.g. New block start = where NaN(“Value”) == 0

This would definitely require a looping function of sorts, but with the date broken up into blocks, it should be easier to apply your above solution as I would only need to generate (and subsequently delete) the new appended lagging columns for a block instead of the entire data set…

This looks more like a task for a Recursive Loop…

1 Like

Ah, sorry. I did not notice right away that column F corresponds to the Interpolated value. For this you really need a recursive loop start or you can use the Java Snippet, if you know Java. There you can save the previous cell’s content in a global variable and access it in the next row. If you want I can help you with that.
Kind regards
Alexander

1 Like

Hi Alexander,
I realised from your comment that it wasn’t so clear. My apologies for that.
I am not a programmer, so any help you can offer on Java/Recursive Loops would be definitely appreciated!

Dear Benjamin,

In Knime you have to think a bit harder about what depends on what in your data, as compared to Excel. In this case, every row depends on the current row and the previous row. In one loop of the calculation the workflow needs to consider these two rows, do the calculation, update the current row. Then, in the next loop, the current row becomes the previous row and the next row becomes the current row. This looks a bit complex:

Note that before the actual calculation I merge two subsequent rows into one row with a Cross Joiner. This is because Knime cannot easily refer to values in other rows (unless you want to use an R or Python snippet). This is annoying sometimes, but it allows Knime to be very efficient in other cases.

This workflow results in:

image

Workflow:
KNIME_project7.knwf (24.8 KB)

2 Likes

Hi,
please see attached workflow. It’s a very simple Java Snippet, which you should be able to adjust to your needs.
Java_lag.knwf (9.5 KB)

Kind regards
Alexander

4 Likes

Interesting, I didn’t know one could use the Java Snippet system variables like that.

Thank you to both of you!
@Aswin, unfortunately I couldn’t trial your version immediately as my computer is lacking Column Expressions and I am lacking the admin rights. I have requested them so I can update my version and the node library - I will make sure to get back to you regarding that.
@AlexanderFillbrunn, this version works a treat! It’s relatively fast and is a strong basis for subsequent data transformations. I will try to incorporate it in my larger workflow.

1 Like

FYI if speed is important, Alexander’s version will definitely be faster!

2 Likes

Here is an alternative version that uses a simpler Recursive Loop construct. Still slower than @AlexanderFillbrunn’s method, but I think it shows off some typical Knime concepts quite nicely.

  • Most loop constructs in Knime do not allow you to use the results of previous iterations (loops) in the next iteration (for example Chunk Loop, Column List Loop), but the Recursive Loop does allow you to do this. In this case, I divide the table in three parts: (1) the previous row, (2) the current row, and (3) the rest of the table. The previous row and the current row are used for the calculation (this is obvious from the Excel example in @Benjamin_B’s post). I insert the result of the calculation in the current row. I save the current row with the result, “forget” the previous row, and use the combination of the current row and the rest of table for the next iteration of the loop. In this next iteration the current row becomes the previous row, and the next row becomes the current row. Each iteration “eats” away the top row of the table, and the process continues until there are no more rows.
  • Calculations are easiest to perform on values on a single row; if you want to do calculations that uses values from multiple rows you are limited to the functions offered by the GroupBy node, or you have to reshape the table in such a way that multiple rows are condensed in a single row. Here I combine two rows into one with a Cross Joiner, but for more complex reshaping operations you can use the Pivoting node.

KNIME_project7.knwf (28.5 KB)

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.