Resampling Time series data

hi all, Greetings

i am quite new to KNIME,

I have time series data in minutely form

2014-03-10T00:00:00 56.0
2014-03-10T00:01:00 56.0
2014-03-10T00:02:00 56.0
2014-03-10T00:03:00 5.0
2014-03-10T00:04:00 154.0
2014-03-10T00:05:00 154.0
2014-03-10T00:06:00 154.0
2014-03-10T00:07:00 154.0
2014-03-10T00:08:00 154.0
2014-03-10T00:09:00 154.0
2014-03-10T00:10:00 56.0


2014-03-10T00:55:00 56.0
2014-03-10T00:56:00 5.0
2014-03-10T00:57:00 5.0
2014-03-10T00:58:00 5.0
2014-03-10T00:59:00 5.0
2014-03-10T01:00:00 5.0
2014-03-10T01:01:00 5.0
2014-03-10T01:02:00 5.0
2014-03-10T01:03:00 5.0
2014-03-10T01:04:00 154.0
2014-03-10T01:05:00 154.0
2014-03-10T01:06:00 154.0
2014-03-10T01:07:00 154.0
2014-03-10T01:08:00 154.0
2014-03-10T01:09:00 154.0
2014-03-10T01:10:00 154.0

the data goes on for 24 hours... is there a simpler way to resample minutely data into hourly data?... the other coulm being averaged or interpolated... similar to pandas python resample ?



you would need a string to date node to parse the format into the knime date format.

Than you use the Date to String node to extract an hourly format only (e.g. yyyy-MM-dd'T'HH)

Finally you can use a groupby node to get your sample. Groupby the last column generated by the time to string node and select an average in your value column.

Cheers, Iris

Hi Iris,

Thank you very much. That worked.

A follow up question on the same topic.

Given that i have an hourly temperature data of a city, how would i convert into 15minute (quarterly) frequency data... using linear or cubic interpolation or any other method ?

Thank you

You could use Constant Value Column to add a column containing the integer value 4, then One Row to Many to make four copies of every row, then a Moving Average node to do the interpolation of the temperature values - various kinds of interpolation are available.

If you need the interpolated 15-minute time points as well, use Time Series Generator set to Saw Tooth Wave, height 45, length 4 and mean 22.5; Double to Int to convert the generated time series to integer; then Date Time Shift using the integer time series column as the shift value in minutes and the original time column as the date reference.

Tom, I had a similar problem. I had minute time series measurements for a device which I needed to downsample to ten minute resolution. To do this I did the following -

  1. Sorted by device and time 
  2. Extracted the minute from the date and time
  3. Started a group loop based on the device
  4. Lagged the measurement column by ten, giving me ten columns, going back ten minutes
  5. Calculate the average for those ten columns (i.e.: a ten minute average)
  6. Calculate the modulus of the minute column (i.e.: 30 mod 10)
  7. Used a rule-based row filter to select only those rows where the modulus was zero, thus being the ten minute multiple
  8. A column filter to clean up
  9. Ended the loop

It seems to work, hope that is helpful.