How to group time series?

I measure the current consumption of some node to identify the actions performed by this node. As the start point  I use the measurements for this node without any actions and try to find the distance between the idle state and states with some actions. The data set I have so far:

?Time (s) Voltage_nV distance
2017-06-12T10:30:27.769 214685,93 25746,106
2017-06-12T10:30:30.701 5609723,62 5420783,796
2017-06-12T10:30:30.710 1325231,79 1136291,966
2017-06-12T10:30:30.721 518620,671 329680,847
2017-06-12T10:30:30.731 274424,737 85484,913
2017-06-12T10:30:30.741 265276,375 76336,551
2017-06-12T10:30:30.751 215287,108 26347,284
2017-06-12T10:30:30.762 228865,892 39926,068
2017-06-12T10:30:30.771 212372,701 23432,877
2017-06-12T10:30:30.802 201107,146 12167,322
     
2017-06-12T10:30:40.700 6843576,34 6654636,516
2017-06-12T10:30:40.710 1292846,59 1103906,766
2017-06-12T10:30:40.720 511380,396 322440,572
2017-06-12T10:30:40.730 344932,474 155992,65
2017-06-12T10:30:40.740 247776,864 58837,04
2017-06-12T10:30:40.753 256324,049 67384,225
2017-06-12T10:30:40.763 247071,134 58131,31
2017-06-12T10:30:40.774 219312,388 30372,564
2017-06-12T10:30:40.784 191854,231 2914,407
2017-06-12T10:30:40.804 208046,833 19107,009

As one can see some rows have very close timestamps . How can I group rows which are closer then 1 sec together? I will calculate the mean value of each group for classification and further processing.

 

Hi ellboy,

First, you need to convert your Timestamps to the KNIME Date and Time type using a String to Date/Time node.

Next, use a Date Field Extractor and a Time Field Extraxctor node to get one column for year, month, day, hour, minute, second.

Finally, you can use GroupBy with all these date/time columns as groups and aggregate the remaining columns for the mean.

Cheers,

Roland

Hi RolandBurger,

actions are not synchronized to the time. That means they cannot be grouped just by the timestamp, because some ot them could start and end in adjacent seconds. What I need is to find which samples are close to each other with some threshold.

Regards,

ellboy

Hi ellboy,

In order to derive a new column suiting your grouping requirements, I recommend to sort by time, lag the time column by one, calculate the time difference between both, leave only the desired values marking a new block and filling the now missing values by taking over from previous row.

See attached workflow for this solution.

Best regards

Arne

1 Like

14 posts were split to a new topic: Changing Timeseries from 10 minute to hourly