How to extract time windows from given dataset and loop over them


i have a dataset which consists of a column timestamp and several other colomns. Each dataset contains rows from one day. So the timestamps are from 00:00:00 to 23:59:59.

For example:

timestamp other_column

'2015-01-01 00:01:00' 2

'2015-01-01 00:30:00' 3

'2015-01-01 01:20:01' 7

Now i want to split the data in chunks with exact one hour in one chunk. So that i can loop over that chunk and do some aggregation and calculation. At the end of the loop i want to have much less rows but with aggregated data for one hour in each row.

The output for the above example could be

'2015-01-01 00:01:00' 5

'2015-01-01 01:20:01' 7


My problem is to get the dataset split into hourly chunks. I've tried a lot but nothing seems to work.

Any suggestions?




Use the Time Field Extractor to extract the hour. 

Afterwards, you can even directly group on this column using the GroupBy node, or use the group loop.

Cheers, Iris

Hi Iris,

thanks for your response.

Group by is not an option for me because i need more complex aggregation methods than those offered by "Group by"-Node.

Found a differend solution.

I have a "Counting Loop Start"-Node which makes 24 iterations. I connected the variable port from that node to a "Java Snippet Row Filter"-Node. The node contains mainly the code from the public example 011004_varsAndTimeSeries in 011_FlowvarsAndLoops.

I am using the currentIteration for creating the timestamps i need to filter (1 hour = 1 iteration).

Seems to works pretty well for me.


Sorry iris, i misunderstood your answer. Thought about it and build a workflow which includes your way. Both ways have the same output but your way is even faster (working on more than 100 million rows) because it stores the information which row belongs to which group in a column and after that step the "group loop"-node only has to search the right digit in that row to find all rows for that loop step. My own solution has to process all rows in every single loop which is of course... realy bad performance :)

So thank you very much for that powerful solution.