Find frist date with non zero value in a time series

goodvirus · January 14, 2021, 9:00am

Hi fellow knime users,

I have a problem that I just can’t wrap my head around.

Given the following Time Series Data:
Location Date Usage
H1 01.01.2021 0
H1 02.01.2021 0
H1 03.01.2021 0
H1 04.01.2021 1
H1 05.01.2021 2
H2 01.01.2021 0
H2 02.01.2021 0
H2 03.01.2021 1
H2 04.01.2021 0
H2 05.01.2021 2

There are different Locations, all with the same start and end date, but some start with 0 and I want to find the first (not all, just the first) non zero value per location and want to remove the leading zero rows.

So the transformed data would look like this:
Location Date Usage
H1 04.01.2021 1
H1 05.01.2021 2
H2 03.01.2021 1
H2 04.01.2021 0
H2 05.01.2021 2

Extra points for a solution where I can define the window of how many leading values should be removed.

The only solution I can think of is a dopple loop, but that not a very elegant solution.

Thanks very much in advanced

ipazin · January 14, 2021, 10:46am

Hello @goodvirus,

here is one approach. Use GroupBy node and group by Location while concatinating Usage without delimiter. Then use String To Number node which will drop all leading zeroes. For extra points you can use String Manipulation node instead of String to Number node together with regex in order to determine window of how many leading values should be removed. (here window is two - regexReplace($column$, "^0{0,2}(.*)", "$1")). The tricky part is now to get back to original data format with dates matching usage. One way is to calculate difference between string lengths (one produced by GroupBy node and one produced by String Manipulation node) which will tell you how many rows to filter for each Location. And for this task you go back to Loops

Give it a try and if any questions feel free to ask.

Br,
Ivan

goodvirus · January 14, 2021, 2:27pm

Hi @ipazin,

thank you very much for the idea. I implemented it and it works (quit a big workflow for a small problem).
Probably that can be done much quicker and leaner, but who cares? (I do, if you have a better solution please post it).

If someone has the same probleme, see my workflow attached.

Thanks,

Paulremove_start_values.knwf (39.1 KB)

goodvirus · January 14, 2021, 3:11pm

Hi again,

now I applied the solution to my big workflow (with a couple of millions of entries) and now the problem ist, that the values go so big that not even long numbers can hold them… Any suggestions?

goodvirus · January 14, 2021, 4:01pm

I think the knime version is to difficult and so I used a python note within a group loop (that could also be done in python).

The python code:
output_table_1 = output_table_1.loc[output_table_1['auslastung'].ne(0).idxmax():]

ipazin · January 15, 2021, 9:54am

Hello @goodvirus,

glad you found a way. Scripting is probably better in this scenario.

And you are right, when values are too big idea with conversion won’t work. However regex from String Manipulation will still do the trick. Check modified workflow:
remove_start_values_ipazin.knwf (51.5 KB)

Br,
Ivan

system · January 22, 2021, 9:54am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.