sum of values of a previous day or more days

example_sum_price

I am new to knime,
help please, which nodes can I use to solve the problem of the previous image.
the column id, datetime, price are input values, sum_price is output data.
I must add the prices of a previous day of each row and assign in the new column (sum_price). “?” is assigned because the date 01/01/2000 has no previous day.

could you provide an example and say again what should happen with the sum_price. What does get aggregated in which way? If you want to do things with previous or following rows there are several principal options:

There is the LAG node to access earlier lines.

You can use some sort of artificial ID or even do a Restart within a rank/ID.

Also the Missing value node allows to access previous and following rows. One could combine that. Cf. this discussion.

Hi @Yonatan,

Welcome to KNIME community!

What I have figured out from your print screen is following. First appearance of day (for example row with id 5) you have to sum everything from previous day (81), every other appearance of same day (rows with id 6 and 7) you sum everything prior in that day in addition with sum of previous day. So for row with id 6 you have 81+34 (=115) and for next you got 81+34+12 (=127).

Here is solution if I got it right:

and here is workflow attached:
2019_03_14_Special_Sum.knwf (28.7 KB)

Basically you want to get sum of each day and cumulative sum in each day. Lag both and then sum it. Explore workflow, play a bit with it using different configurations and if any questions feel free to ask.

Happy KNIMEing!

Br,
Ivan

2 Likes

Hi @ipazin Ivan, thanks you very much for the help.

The workflow is correct, but the previous day should be considered including the hour, minute and second, in the image I share the sum_price output column.

train_dataset_sum_price :price_previous.xlsx (9.7 KB)

knime_sum_price

Thanks in advance.

I’m confused. It seems that you are interested to get the sum of rows within the last 24 hours. Is that correct?
If it is so, then it contradicts the example that you have provided: row 5 should include only row 4 in the sum (thus outcome should be 23 instead of 81), as row 3 is more than 24 hours in the past (same as row 11 vs 14). It seems to me, that the previous answer by @ipazin fully reproduces your original request (other then outputting missing values for ALL rows of the first day)

Hi @lisovyi
it is correct, sum of rows within te last 24 hours.
It is correct row 5 of the column sum_price should be 23 (typing error), the example @ipazin is fine only that sum by dates and does not consider the time of 24 hours.

thanks in advance

1 Like

Hi there!

so sum of rows within the last 24 hours is what is needed.

Here is workflow:
2019_03_15_Sum_Last_24_Hours.knwf (46.2 KB)

This was interesting. Maybe someone could do it in a more cool way :slight_smile:

Br,
Ivan

3 Likes

Hello Ivan, you’re excellent.
Thank you very much, this model is the solution to the proposed example.
Now I am testing my data for 24h, 48h, 72h, 96h, … previous.

thank you very much for your great help :wink: :

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.