calculate average value per hour for non uniform time data series

Hello,
I have a time data series that is not uniform (values are taken at arbitrary intervals).
I want to calculate the average value per hour. How can I do that? A Group By node will calculate the mean value of all the measures in a given hour but it is different from what I need.

image

image

Thank you and sorry for the dumb question (im new to knime!)

Hi @Olivier_G

Welcome to the KNIME Community!

You can get the hour of each record via the Extract Date&Time Fields node:

You can then use this a base for the groupBy to sum all the values for set hour.

image

Is this what you’re looking for?

1 Like

Hi @ArjenEX . Thank you for spending time answering my question. Unfortunately a group by node is not enough because I do nothave constant time intervals between measures. I don’t need the average of every single measure, but I have to take in account the time spent between measures. Immagine having a measure every 10 minutes and at some point 10 repeated measure at a very short time intervals. You cant average all together.

Can you manually draft your expected output then in terms of data?

Based on the image in your original post it looks like you want the average of all records that have a certain base hour (13,14, etc.) but your second reply confuses me.

Are you trying to predict the value at a certain time during that hour or something?

I am also confused by your description, but I will often apply a calculated “weight” multiplier to granular data (but in a much different application). Just throwing out the general idea of scoring the values (with unchanged =1) and then multiplying them by their scores weights as a possible way to build a more complex approach… Just taking a shot in the dark.

1 Like

Sorry for the confusion. I will try to meke it clear by an example.
Let’s say I have readings of a car speed over an hour, as following:

image

The car run at 100 for the whole hour except five minutes at 200.

image

Now I want to calculate the average speed of the car during the hour. By hand the result is something around 108, but if i average all the readings I get 160 which is not what i’m looking for.

In that way the average value of the groupby node is not enough.

1 Like

Hi @Olivier_G,
the reason for your problem is that your data are not equally distributed (time frames). Therefore a simple averaging doesn’t give you the correct result.
(10/20 mintue blocks for speed 100, 1 minute blocks for spped 200)
Descriptive statistics takes the number of data points in account not the delta (distance) between the datapoints.

You can think about to calculate the absolute driven route for each time point and sum them up.

BR

1 Like

Hi @Olivier_G

@morpheus is right, you need take into account the sampling intervals (or time frames) because data is not equally sampled (hence distributed).

There are different ways this could be solved using KNIME.

Could you please upload here a bit of your data so that we can take the problem from there and provide you with at least a solution adapted to your data ? Thanks in advance.

Best
Ael

1 Like

The way I have handled things like this in the past is as I described above. You could use the lag node to drop down the prior value, then calculate the interval, then use that to create a weight multiplier for each recorded value to use in the average calc.

1 Like

Sure,
Here is the actual data I am working on. It is the status of 3 industrial machines (1= machine running, 0 = machine stopped). I need to calculate the AVAILABILITY value of every machine for every hour, that is the fraction of the time the machine was running (eg. 90%).

TELEMETRY_2022-12-09.xlsx (974.2 KB)

Thank you for your time!

1 Like

I like this approach. I could add a “duration” weight for every row and do a weighted average. Is there any way I can do a weighted average in the groupby node?

1 Like

You would need to calculate the weighted values first, then you can do a straight average in GroupBy node based on those.

Thanks @Olivier_G for sharing your data.

The following workflow brings a possible solution to your question based on your data:

The workflow is available from the hub here:

This is the result obtained from your data for the first column machine:

The solution is commented on each node of the workflow but please, feel free to reach out again if you have any questions about its implementation.

Hope it helps.

Welcome to the Knime forum community :grinning:

Best
Ael

4 Likes

@iCFO @aworker
Thank you, this is exactly what I need.
Thank you Ael for building the workflow for me!

Olivier

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.