calculate average value per hour for non uniform time data series

Olivier_G · December 28, 2022, 7:11pm

Hello,
I have a time data series that is not uniform (values are taken at arbitrary intervals).
I want to calculate the average value per hour. How can I do that? A Group By node will calculate the mean value of all the measures in a given hour but it is different from what I need.

Thank you and sorry for the dumb question (im new to knime!)

ArjenEX · December 28, 2022, 7:21pm

Hi @Olivier_G

Welcome to the KNIME Community!

You can get the hour of each record via the Extract Date&Time Fields node:

You can then use this a base for the groupBy to sum all the values for set hour.

Is this what you’re looking for?

Olivier_G · December 28, 2022, 8:10pm

Hi @ArjenEX . Thank you for spending time answering my question. Unfortunately a group by node is not enough because I do nothave constant time intervals between measures. I don’t need the average of every single measure, but I have to take in account the time spent between measures. Immagine having a measure every 10 minutes and at some point 10 repeated measure at a very short time intervals. You cant average all together.

ArjenEX · December 28, 2022, 8:48pm

Can you manually draft your expected output then in terms of data?

Based on the image in your original post it looks like you want the average of all records that have a certain base hour (13,14, etc.) but your second reply confuses me.

iCFO · December 28, 2022, 9:17pm

Are you trying to predict the value at a certain time during that hour or something?

I am also confused by your description, but I will often apply a calculated “weight” multiplier to granular data (but in a much different application). Just throwing out the general idea of scoring the values (with unchanged =1) and then multiplying them by their scores weights as a possible way to build a more complex approach… Just taking a shot in the dark.

Olivier_G · December 29, 2022, 9:10am

Sorry for the confusion. I will try to meke it clear by an example.
Let’s say I have readings of a car speed over an hour, as following:

The car run at 100 for the whole hour except five minutes at 200.

Now I want to calculate the average speed of the car during the hour. By hand the result is something around 108, but if i average all the readings I get 160 which is not what i’m looking for.

In that way the average value of the groupby node is not enough.

morpheus · December 29, 2022, 12:20pm

Hi @Olivier_G,
the reason for your problem is that your data are not equally distributed (time frames). Therefore a simple averaging doesn’t give you the correct result.
(10/20 mintue blocks for speed 100, 1 minute blocks for spped 200)
Descriptive statistics takes the number of data points in account not the delta (distance) between the datapoints.

You can think about to calculate the absolute driven route for each time point and sum them up.

BR

aworker · December 29, 2022, 12:32pm

Hi @Olivier_G

@morpheus is right, you need take into account the sampling intervals (or time frames) because data is not equally sampled (hence distributed).

There are different ways this could be solved using KNIME.

Could you please upload here a bit of your data so that we can take the problem from there and provide you with at least a solution adapted to your data ? Thanks in advance.

Best
Ael

iCFO · December 29, 2022, 1:01pm

The way I have handled things like this in the past is as I described above. You could use the lag node to drop down the prior value, then calculate the interval, then use that to create a weight multiplier for each recorded value to use in the average calc.

Olivier_G · December 29, 2022, 1:03pm

Sure,
Here is the actual data I am working on. It is the status of 3 industrial machines (1= machine running, 0 = machine stopped). I need to calculate the AVAILABILITY value of every machine for every hour, that is the fraction of the time the machine was running (eg. 90%).

TELEMETRY_2022-12-09.xlsx (974.2 KB)

Thank you for your time!

Olivier_G · December 29, 2022, 1:06pm

I like this approach. I could add a “duration” weight for every row and do a weighted average. Is there any way I can do a weighted average in the groupby node?

iCFO · December 29, 2022, 1:08pm

You would need to calculate the weighted values first, then you can do a straight average in GroupBy node based on those.

aworker · December 29, 2022, 1:55pm

Thanks @Olivier_G for sharing your data.

The following workflow brings a possible solution to your question based on your data:

The workflow is available from the hub here:

This is the result obtained from your data for the first column machine:

The solution is commented on each node of the workflow but please, feel free to reach out again if you have any questions about its implementation.

Hope it helps.

Welcome to the Knime forum community

Best
Ael

Olivier_G · December 29, 2022, 8:12pm

@iCFO @aworker
Thank you, this is exactly what I need.
Thank you Ael for building the workflow for me!

Olivier

system · January 5, 2023, 8:13pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.