Hi elsamuel,
I want to give a quick update where i am at. Unfortunately depressing news.
I compared today the Data that Knime aggregated With Window Loop with some old Data (that was aggregated with Access etc.)
Comparing the 2 Datasetzt showed literally thousands of differences in the output file.
I spent the day today researching this. What i found out is that the Node âWindow Loopâ Node has several bugs. (Note there might be more errors i was going through the first couple of errors not all of them)
Could someone from support please look into this like ipazin or someone else - this would need some development attention. I would be interested in a working solution too since i canât use Window Loop (and i wouldnât recommend it anybody else right now). Here is what i found out so far:
- Output csv file contained double entries like:
21.11.2011 12:10:00,2.45,2.45,2.45,2.45,39000
21.11.2011 12:10:00,2.45,2.45,2.45,2.45,39000
Or
14.09.2011 07:00:00,5.58,5.58,5.58,5.58,12549
14.09.2011 07:00:00,5.58,5.58,5.58,5.58,12549
but there are many more. This can probably be fixed with some form of duplicate removal.
More serious is that the process does not exclude the right border as it should.
The Window Loop aggregated Data showed me for the Candle of
27.2.2012 13:30 Volume: 30 300 comparing this with the Old data that was aggregated without Knime showed: 25 500. The difference is 4800. Researching this with the Input data (i Changed to timestamp from Unix to readable):
27.2.2012 13:33:59;5,08;5300
27.2.2012 13:34:12;5,08;5200
27.2.2012 13:34:24;5,08;5100
27.2.2012 13:34:36;5,08;5000
27.2.2012 13:34:50;5,08;4900
27.2.2012 13:35:00;5,08;4800
It shows that Windows Loop wrongly included the right border since it sits exactly on the edge. If one aggregates data to OHLCV the standard is to go from lower bound to the last Tick (row in database) before the next step begins.
In documentation it is described that way too:
Step size
The step size is the distance between the starting point of one iteration and the starting point of the next. It is defined in terms of number of rows covered (row based) or time.
Another problem is: All in all the process crashes most of the time or doesnât finish. ElSamuel had similar problems too. For me I ran the process 4 times and 3 times it crashed/didnât finish. The data above is from the 1 time it did finish. When it die finish it was a pretty slow process tooâŚ
To reproduce:
Input Data: http://api.bitcoincharts.com/v1/csv/bitstampUSD.csv.gz
totaldataloss/Public â Tickdata to OHLC_V2 â KNIME Community Hub
@ElSamuel thanks for sharing you data i ran your process last night again. But it crashed After just 2K lines. The good news is the aggregated data was the same for both files - so it except duplicate entries the aggregation itself seems to be stable in what it does.
I will try to have a look at your new process tomorrow.
Again at support if there is a better way of doing this i would be really interested in a solution or a better idea how to do this
GreetingsâŚ