I am trying to read an excel with around 100 Columns and 400K Rows (file size around 160MB), however, the excel reader dont allow me to proceed, and say “Requested array size exceeds VM limit”.
In my KNIME.ini, i have make change of below.
-Xmx12g (revised to 12g, while my computer has 16g)
-Dknime.database.fetchsize=1000 (Newly added)
If there is no direct way to deal with that, do we have some play around trick please?
All i want is to read this file (5 more files at similar sizes), filter by a string at certain column, extract those and append to form a new file.
So, you could read let’s say 40k rows at a time, and do your process and write to your new file, and then repeat the process and append to the file.
You can use a Loop to do this.
You can check this thread that I replied to earlier that can help you with how to read a file partially implement the loop:
EDIT: Just to add, so you would dynamically calculate the range for the rows within the loop, and assign the range as flow variables in the Excel Reader to these 2:
Thanks, i am trying your suggested method, and now i am wondering there is another problem.
When I lowered the row count to 5000, it still return the same error msg and dont let me to execute.(and i tried with 500 same issue, that isnt normal…)
I don’t have the VM limit problem with other excel that is at smaller size tho (i.e around 30MB, 50 columns and 10k Rows)
HI, reduced the row to only scan 5000 rows at the advanced setting tab, plus limit the Col and Row limit per below… still fail… lol i am guessing we missed something here…probably not the VM issue anymore
Thanks Mlauber71 again, i am trying out with R, i guess i get it installed correctly, but then how do make the R source table node to “read” the excel pls?
You would have to install the package. Best thing would be to install RStudio as well. Then you can easily manage your R installation and you will have the packages for KNIME. Typically this command should do:
Not hitting a dead horse here, but strangely i have 4 flows that are reading 4 different excel (each over 100 MB), just now the biggest one around 165MB, cant go through with below error msg.
Basically i just read the file through table creator, then run with Table Row to Variable, and then R source table node, it works fine for 3 other files, but not this time on the first file.
@Fovas I think you would have to identify at what point the script does fail. It is at the import of the data or when transferring it to KNIME. If you transfer it from the R script node to KNIME you do have the options to use data.frame or data.table. Or you could export the data into a file from R directly. You might evaluate the code with the failing file in steps and see what is going on.
But it might be that the problem is already there when the import is happening.