I want to give a quick update where i am at. Unfortunately depressing news.
I compared today the Data that Knime aggregated With Window Loop with some old Data (that was aggregated with Access etc.)
Comparing the 2 Datasetzt showed literally thousands of differences in the output file.
I spent the day today researching this. What i found out is that the Node “Window Loop” Node has several bugs. (Note there might be more errors i was going through the first couple of errors not all of them)
Could someone from support please look into this like ipazin or someone else - this would need some development attention. I would be interested in a working solution too since i can’t use Window Loop (and i wouldn’t recommend it anybody else right now). Here is what i found out so far:
- Output csv file contained double entries like:
but there are many more. This can probably be fixed with some form of duplicate removal.
More serious is that the process does not exclude the right border as it should.
The Window Loop aggregated Data showed me for the Candle of
27.2.2012 13:30 Volume: 30 300 comparing this with the Old data that was aggregated without Knime showed: 25 500. The difference is 4800. Researching this with the Input data (i Changed to timestamp from Unix to readable):
It shows that Windows Loop wrongly included the right border since it sits exactly on the edge. If one aggregates data to OHLCV the standard is to go from lower bound to the last Tick (row in database) before the next step begins.
In documentation it is described that way too:
The step size is the distance between the starting point of one iteration and the starting point of the next. It is defined in terms of number of rows covered (row based) or time.
Another problem is: All in all the process crashes most of the time or doesn’t finish. ElSamuel had similar problems too. For me I ran the process 4 times and 3 times it crashed/didn’t finish. The data above is from the 1 time it did finish. When it die finish it was a pretty slow process too…
Input Data: http://api.bitcoincharts.com/v1/csv/bitstampUSD.csv.gz
@ElSamuel thanks for sharing you data i ran your process last night again. But it crashed After just 2K lines. The good news is the aggregated data was the same for both files - so it except duplicate entries the aggregation itself seems to be stable in what it does.
I will try to have a look at your new process tomorrow.
Again at support if there is a better way of doing this i would be really interested in a solution or a better idea how to do this