As I said, Knime can open files with any size. It’s your system that is limiting it. There were a few questions and suggestions that were made regarding your memory allocation. Did you try them? You mentioned that you increased the size of memory of your system, but I then asked if you also increased the allocation of memory to Knime? With 32GB, you should easily be able to open a 6.2GB file.
Is that a requirement, or is this your workaround because you cannot open a big file? Try to assign the proper amount of memory to your Knime, it should work.
Regardless, it’s also good to know how to partially read a file. The Excel Reader allows you to read from a start row to an end row, so you can partially read an Excel file:
The CSV Reader and the File Reader also both allow you to read a file partially by setting parameters in the Limit Rows tab:
And of course, as always in Knime, any setting you can set manually can be controlled by a variable dynamically, and so for all these settings, you can set the values dynamically in the Flow Variables tab:
I’m not sure what the “w” means in 100w or 200w (and by the way, if you do 100w rows, the second time it will be (100w+1) to 200w rows), but to keep it simple, let’s say we have a file containing 20 records and I want to read 5 lines at a time.
That means it should read:
1: 1-5 rows
2: 6-10 rows
3: 11-15 rows
4: 16-20 rows
Do you see a pattern here?
Rows to read: 5
Starting rows of every run: ((run-1) x 5) + 1
The good thing is that when you run this in a loop, Knime will give you what is the currentIteration is, and since iteration number starts with 0, in reality, this is what you will get:
Iteration | rows read
0 | 1-5
1 | 6-10
2 | 11-15
3 | 16-20
So, to get the starting row of each iteration, you can just use ($currentIteration$ * 5) + 1
, meaning if you want to know how many rows to skip for each iteration, it’s just $currentIteration$ * 5
, of course where 5 is the number of rows you want to read, so in reality, the proper formula would be:
Rows to skip: $currentIteration$ * $rows_per_batch$