Updating Values inside a Loop, then continue loop

Hello to the Forum,
i am a Newbie and i am really having trouble to get this done. The Table looks like this:
Table

I want to check now whether the Timestamp is between the “Lower” and “Upper Bound”. Here some pseudocode from a rule engine:

Since its hard to read:

$Timestamp$ < $UpperBound$ => $LowerBound$
$Timestamp$ >= $LowerBound$, then update variables:
//Now update Variables
LowerBound=LowerBound+300
UpperBound=UpperBound+300
now go back to line 5 and loop through the column ‘Timestamp’ till the end of the table.

So what i want to do is find all values that lie in between the first 5min Interval, write for them in a new Column NewTimestamp the beginning of the interval (This is for one the Timestamp for that interval but acts as a marker, so that i can later use GroupBy for this column to aggregate the data for the interval in the next step). Then create the next interval by adding 300 seconds to the lower und upper border of the interval and continue to find all the values that belong int hat interval, then updaten interval again… and so on…

Upper and LowerBound don’t need to be actually in the table they could be flow variables too, i just wasn’t able to update them and continue the loop with the updated values. The column Timestamp is in ascending order.

My trouble was that i could check for the condition and write lowerBound in a new column, but the part with updating 2 Variables inside the loop and continue the loop with new values i was not able to do.

Thanks!

Rule Engine does not support math operations. Consider


for the task. You may need to install it first.

1 Like

If here is some testdata:
Input.txt (430.5 KB)

Hi izaychik63, could you (or someone else) post an example of this? I would be really thankful. Problem is i don’t get it how this would be used. Would i then still need a loop start and end? Or would i write the loop inside of this? - and i don’t really can code in java. I mean i would understand it if see something. If someone could post a quick example it would help a lot. But thanks for the tipp…

Hi @TotalDataLoss

Take a look at this wf updating values in loop.knwf (232.9 KB)


Does this help to solve your problem?
gr. Hans

1 Like

Look here

Hi Hans,
thx for that workflow. But the worfklow doesn’t build an time interval. Maybe it was a bit unclear what i wanted to do:

  1. I give the process a LowerBound and an Upperbound as a unixtimestamp. The Input in the worfklow could be via FlowVariable or via Create Table.This would be for the file:
    LowerBound= 1442824500
    UpperBound= 1442824800 (its alsways 300 higher than LowerBound)

  2. Then collect all the rows where the timestamp is between Lower and UpperBound as long as timestamp<UpperBound:
    For timestamp (col0)<UpperBound
    timestamp(col0) >= LowerBound AND timestamp <UpperBound
    then write the timestamp of the LowerBound in an NewColumn for each row

  3. Now increase LowerBound and UpperBound by 300 each.
    LowerBound=LowerBound+300
    Upper Bound=UpperBound+300
    and continue with 2.

For example the file starts at Timestamp:1442824531
which is more readable:
21.09.2015 08:35:31

it would now start with the first given intervall (the unix timestamps above):
LowerBound:21.09.2015 08:35:00
UpperBound: 21.09.2015 08:40:00
Collect all rows that lie within these boundaries.
Write the Timestamp of the LowerBound in a new column then set the new interval:
LowerBound: 21.09.2015 08:40:00
UpperBound: 21.09.2015 08:45:00
and so on…

So that in the next step i can then can use GroupBy with Unixtimestamps of the beginning of the interval in the new column and make the Sum of Col2 (which is what i actually want).

Have a good day!

Hi izaychick63,
thx for the link!
It got me to the point with the column expression:

timestamp=column(“timestamp”)
LowerBound=1442824500
UpperBound=1442824800

if(timestamp<UpperBound) LowerBound
else if (timestamp>=UpperBound)
LowerBound=LowerBound+300
UpperBound=UpperBound+300
LowerBound

My problem is now that it only calculates the first 2 Intervals correctly (last row are the values)


But then it always keeps 1442824800

The last 5 Lines with the If-Statement would need to run inside a loop for the whole table.
I would now need the Syntax for a loop, unfortunately i don’t know how to do it??
Does someone here know how to write a loop in the column expression node?

Hi @TotalDataLoss

I made some changes to the wf. I hope I understood your question right this time .updating values in loop 2.knwf (519.5 KB)


gr. Hans

2 Likes

Hi there @TotalDataLoss and others,

sometimes when developing (and expected solution is not working - Window Loop Start) you can try to start from the back or from end of requirement. In your case you need to perform (simple) grouping operation. Easiest thing to do is to use GroupBy node. So you need to go from your timestamp into timestamp you can as grouping criteria (column) meaning following:

Timestamp LowerBoundTimestamp
2011-09-13T13:53:36 2011-09-13T13:50
2011-09-13T13:53:44 2011-09-13T13:50
2011-09-13T13:53:49 2011-09-13T13:50
2011-09-13T13:53:54 2011-09-13T13:50
2011-09-13T14:32:53 2011-09-13T14:30
2011-09-13T14:35:04 2011-09-13T14:35
2011-09-13T14:36:54 2011-09-13T14:35
2011-09-13T14:54:23 2011-09-13T14:50
2011-09-13T15:31:38 2011-09-13T15:30
2011-09-13T19:32:59 2011-09-13T19:30

Now to do above operation in KNIME (create LowerBoundTimepstamp column from Timestamp column) you can take couple of approaches/different nodes.

One is to extract minutes (Extract Date&Time Fields) and then in Rule Engine node create new minutes.

$Minute$ <  5 => "00"
$Minute$ < 10 => "05"
$Minute$ < 15 => "10"
$Minute$ < 20 => "15"
...

which you will then join in String Manipulation node with original timestamp with appropriate usage of substr() functions. From there you have it as you can now properly group.

Another similar way is to only look at single minute digit or calculate duration and then utilizing Date&Time Shift node (that wasn’t polish till the end but is interesting one…).

Workflow is attached.
Tickdata to OHLC_V2_ipazin.knwf (625.2 KB)

Notice: I tested my solution only on part of your data and workflow is not reset, UpperBound is not necessary and regarding time execution all three nodes I used are streamable so you can try streaming functionality to speed things up once you are done developing.

Hope this helps!

Br,
Ivan

2 Likes

Hi Ipazin fellow traveler, i just stumbled across your post. Well that looks interesting especially since it uses streaming, something i want to try.
To the problem I found a solution now that is similar to yours but integrates all in one formular and is computational very efficient. One just writes in a column expression node, and saves as a new integer column:

x= (column(“timestamp”)-variable(“start”))/300
toInt(x)

The variable “start” ist just flow variable with the desired startdate as a unixtimestamp e. g. =
1442824500

With this variable in the new column one can just GroupBy.
(I don’t want to post the whole workflow since it contains a lot of adjustments, column renames etc. that are user specific) In core its just that formula.
Since its very easy for computers to calculate integers, the Formula finished in just 26min compared to 2:45h min for Window Start.

Right now there are still some differences between this Data and the old data, so i don’t want to scream Juchu to early. But i think that those differences are due to an errenous rounding in Access…
But i wanted to check back.

@izaychik63 thanks for the tip with the rule engine - it made everything possible!
@HansS: Thanks for your version 2. I played quite intensively with it - lots of interesting application of principles didn’t understand everything. The problem was a bit speed it was simply not fast enough i think the Recursive Loops slow it to much down…

So thank all of you without you i wouldn’t have made it this far…

3 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.