I need to perform moving average on the data for each group.
So for example in this data:
x y z
1 38.855 38.855 a
2 38.664 38.664 a
3 40.386 40.386 a
4 40.386 40.386 a
5 40.195 40.195 a
6 40.386 40.386 a
7 40.386 40.386 b
8 40.195 40.195 b
9 40.386 40.386 b
10 38.855 38.855 b
11 38.664 38.664 b
12 40.386 40.386 b
I would like to perform moving average for the columns x and y but for each unique value in z column (in this case a and b) and not through all of the columns without any consideration.
Is it possible?
Thanks in advance
Sure it is possible! :-)
The way I would do it is to use a loop to chunk by group (z column), calculate the MA for each group and finally put back together all the results.
See the attached workflow. Not exactly your data, but very similar.
BTW, if your grouping column (z) is that simple, you can also use a Group Loop Start node which has the chunking by group built in. See alternative workflow at the bottom of the original one.
Hello Marco, Thanks for the quick answer! I am going to try it out...My data is quite huge (can be up to 1000000 rows), is this going to influence the productivity? I need quick way of performing the moving average (the idea is to connect KNIME to database, import data every 1hour, perform moving average and write data back to the system).
1M rows is not that much these days... :-)
It is difficult to say upfront whether it would work as expected, too many factors that can influence the performance (including the source/target DB, the network, the characteristics of the system running KNIME etc.)
Since it is a quite simple setup, I would suggest to just give it a try and see how it performs in real-life.
Thanks, well for my 604 unique groupd it is taking some time, but it still better then nothing!
*I was expecting something fast as in R function ave()
You can always resort to one of KNIME's integration nodes (e.g. R, python, Java) and use your favorite efficient implementation of your use case.
If your database runs on server hardware, you should try writing the task in SQL. You can still use KNIME's database nodes as remote control, if so desired.
Loops inevitably slow things down. You may want to consider pivoting your data first to get one column (or in this case: two columns) per group, and then to unleash the MA node on all target columns at once -- this should speed things up quite substantially.
Hey, Thanks to all for help! I have decided to use R-Snippet Node and write the moving average function there(it saved lots of time for me)