[Excel reader node] Requested array size exceeds VM limit

HI everyone,

I am trying to read an excel with around 100 Columns and 400K Rows (file size around 160MB), however, the excel reader dont allow me to proceed, and say “Requested array size exceeds VM limit”.

In my KNIME.ini, i have make change of below.
-Xmx12g (revised to 12g, while my computer has 16g)
-Dknime.database.fetchsize=1000 (Newly added)

If there is no direct way to deal with that, do we have some play around trick please?

All i want is to read this file (5 more files at similar sizes), filter by a string at certain column, extract those and append to form a new file.

Thanks in advance

Hi @Fovas , you can try to read the file in batches. The Excel Reader allows you to read between a range of rows:

So, you could read let’s say 40k rows at a time, and do your process and write to your new file, and then repeat the process and append to the file.

You can use a Loop to do this.

You can check this thread that I replied to earlier that can help you with how to read a file partially implement the loop:

EDIT: Just to add, so you would dynamically calculate the range for the rows within the loop, and assign the range as flow variables in the Excel Reader to these 2:

2 Likes

Thanks, i am trying your suggested method, and now i am wondering there is another problem.

When I lowered the row count to 5000, it still return the same error msg and dont let me to execute.(and i tried with 500 same issue, that isnt normal…)

I don’t have the VM limit problem with other excel that is at smaller size tho (i.e around 30MB, 50 columns and 10k Rows)

Hi @Fovas , that was a good test. There’s definitely an issue… Not sure what is causing this…

Can you try to read just 1 column to see what’s the behaviour?

May be also reduce the number of rows to scanned:

HI, reduced the row to only scan 5000 rows at the advanced setting tab, plus limit the Col and Row limit per below… still fail… lol i am guessing we missed something here…probably not the VM issue anymore

@Fovas you could try and use R (1|2) to import the file Excel file and see how that does work out

3 Likes

Definitely something weird going on there @Fovas

Thanks Mlauber71 again, i am trying out with R, i guess i get it installed correctly, but then how do make the R source table node to “read” the excel pls?

@Fovas you could check out the example and adapt it to your needs

2 Likes

Sorry for my stupid question here, i get to the point that with below in the R source (Table) node.


to read excel files

library(readxl)

v_excel_file ← knime.flow.in[[“PSR_Location”]]
v_excel_sheet ← knime.flow.in[[“PSR_SheetName”]]

knime.out ← as.data.frame(read_excel(v_excel_file,sheet = v_excel_sheet, col_types = “text”, col_names = TRUE, .name_repair = ~make.unique(.x, sep = “_”)))


However, when i try to execute it, it return the error msg below.
And i cant connect a regular Excel Reader node with the R Source (table) node…


ERROR R Source (Table) 4:84 Execute failed: Error in R code:
Error: there is no package called ‘readxl’
Error: could not find function “read_excel”

1 Like

You would have to install the package. Best thing would be to install RStudio as well. Then you can easily manage your R installation and you will have the packages for KNIME. Typically this command should do:

install.packages(“readxl”)

2 Likes

Hooray it worked!, always happy to see those little greenlight

Thanks mlauber71 and bruno29a on the help here.

3 Likes

Not hitting a dead horse here, but strangely i have 4 flows that are reading 4 different excel (each over 100 MB), just now the biggest one around 165MB, cant go through with below error msg.

Basically i just read the file through table creator, then run with Table Row to Variable, and then R source table node, it works fine for 3 other files, but not this time on the first file.

here is what i had in the R source (Table) node:

to read excel files

library(readxl)

v_excel_file ← knime.flow.in[[“PSR_Location”]]
v_excel_sheet ← knime.flow.in[[“PSR_SheetName”]]

knime.out ← as.data.frame(read_excel(v_excel_file,sheet = v_excel_sheet, col_types = “text”, col_names = TRUE, .name_repair = ~make.unique(.x, sep = “_”)))

And below is what the error msg about…

ERROR R Source (Table) 3:84 Execute failed: Error in R code:
Error: std::bad_alloc


Any idea please?

@Fovas you should check how much memory you have allocated for KNIME in general and specifically for KNIME and R:

1 Like

HI Mlauber, thanks for your swift reply,

Yes i have the memory increased:

In my KNIME.ini, i have make change of below.
-Xmx12g (revised to 12g, while my computer has 16g)
-Dknime.database.fetchsize=1000 (Newly added)

Also, I just follow your printscreen and checked, it had been set the buffer size limit to 0 on R

Is there any other area that i should make change please?

First question,
why excel? It is probably one of the worst file types. Is it an option to try csv instead?
br

2 Likes

@Fovas I think you would have to identify at what point the script does fail. It is at the import of the data or when transferring it to KNIME. If you transfer it from the R script node to KNIME you do have the options to use data.frame or data.table. Or you could export the data into a file from R directly. You might evaluate the code with the failing file in steps and see what is going on.

But it might be that the problem is already there when the import is happening.

@Fovas another thing to try could be to use R’s “openxlsx” package:

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.