Irregular to regular time series

Dalmatino16 · February 17, 2022, 8:39am

Hi, does someone have an idea how to resample time series data to have regular step. I have daily data and when there is no measurement on certain periods of time, I have missing rows. I would like to have a row for each day in the span of 30 years. I have timeseries data for multiple gauging stations so the columns for the stations with missing data are shorter. I would like them to be all of the same length, but with empty cells where there is no data, instead of skipped cells.

aworker · February 17, 2022, 8:53am

Hi @Dalmatino16

Is it possible for you to post here a bit of your data (if not confidential) ? We will try to provide a workflow solution from there.

Best
Ael

morpheusTM · February 17, 2022, 9:02am

Your request is to have a table with daily rows with missing in the days in which you have got no data.

You can generate a time series with the create date&time range appropriately configured then join by the date columns of both, the output column of the time range generator and your table, configuring the left unmatched option in the joiner (supposing that you put the generated column in the left input of the joiner).
At the end you will have a new table as you requested.

I hope to have been of help.
Giuseppe

Dalmatino16 · February 17, 2022, 9:25am

H_daily_3012.xls (224.8 KB)

H_daily_3155.xls (188.7 KB)

Dalmatino16 · February 17, 2022, 9:26am

Hi, my data is river water stages. Filling in the missing data is the next step, which I will probably also need a help
But, I try to do it one step at the time. First step is to fill only the dates, which are discrete. One row represents one day. So the first column would be days, one by one, without skipping. 30 years period.

It should look like this:

But sometimes I have missing datas which looks like this - take a look at the middle column

Originally I have gauged water stages with dates. One .csv file for each station.
Attached in my other post you can see example of two files. Station 3012 has all the dates and rows, and station 3155 has missing data so the colum is shorter because its missing the dates as well.

I have 100 files and stations like that. And the final results I need is something which looks like this:

I already did this with knime by using this workflow:

But I din`t get the results I need because not all stations have the same length of columns and therefore the dates in the first column are valid just with the records with the full columns. I hope I was clear what the problem is.

So, wherever there is skipped data, like in file 3155, I need the value cell to be left empty.

Filling in the missing data based on correlation would be the next step, but I need to resolve the first one first. So if you have any tips, I would appritiate it.

Dalmatino16 · February 17, 2022, 9:51am

Hi Giuseppe. Do you think something like this? I am trying to do it with just one file, one station… Not sure how to proceed.

morpheusTM · February 17, 2022, 10:09am

Yes, you are on the right way, but before proceeding you need to convert the date column of the csv file to a Date&Time type column.
You can do it straight in the csv reader node or by using a String to Date node.
Obviously don’t forget to set in the Joiner Settings tab the joining columns of the two tables.

Dalmatino16 · February 17, 2022, 10:49am

Still the joiner is skipping the rows.
Here is the workflow

Here is the Node 21 output

Here is the node 24 output with skipped year 1992

And here is the joiner results which skip the year 1992 as well.

Any guess why?

Dalmatino16 · February 17, 2022, 10:53am

I see now that unmatched ( year 1992 has been sorted at the end of the column

Is it possible to have it where it should be chronollogically?

morpheusTM · February 17, 2022, 11:04am

From the image you posted I see the joiner is wrong configured, you need to set only matching and left unmatching rows (your configuration is for a full-join).
To get the column sorted you can use a sorter after the joiner (sort by date)