How to fix "ZIP entry size is too large or invalid" in Excel Reader

I try to read multiple files in folder by using Excel Reader node but getting error. It shows error “ZIP entry size is too large or invalid”. The size of the files in total is 1400+ MB. I changed the -Xmx to 10g in knime.ini file, it didn’t work.

Hello @chunhlim,

and welcome to kNIME Community!

I have couple of questions

  • If I got it right you are using Files in folder option? How many files are you reading into KNIME?
  • Are those XLS or XLSX files?
  • What KNIME version are you using?

Br,
Ivan

3 Likes

Hello @ipazin ,

  1. Yes, i’m using files in folder option. 3 files with size range 200 MB to 600+ MB.
  2. Those are XLSX files
  3. 4.2.2 is the version that i’m using

Regards,
Lim

@chunhlim you might want to upgrade to the latest version. And you could check these thread which deals with a similar problem:

I assume there is no chance to share the files in question?

2 Likes

I would also be curious about the files
What kind of data needs to be stored in 200+mb Excel files? Any chance to convert it to a flat file (csv,…) instead?
br

3 Likes

Hello @chunhlim,

I see. Along with other good suggestions (update, more memory, R/Python, conversion) you can try good old fashion loop approach :wink: Do you know how to do that and can you at least read a single Excel file into KNIME?

Br,
Ivan

2 Likes

These files were provided by my client. There’s no way i could convert it to csv file because the workflow i built will be used for them to read the Excel file. May i know, is there a size limit for reading the Excel file in Knime?

Hi @ipazin ,

I’ve tried to use “table row to variable loop”, but the single file is too large and failed during the loop process. Is it the loop approach you mentioned? Am i doing it right?
image

Regards,
Lim

@mlauber71 may i know the latest version for Knime? i tried to update my current Knime, but failed.image

@mlauber71, may i know how could i get the R node? It seems like the R node is not installed in Knime.

Hi @chunhlim , you’ll need to install R itself if you want to use R within Knime, same thing with Python.

I think R should do the trick, it should work.

If you want to stay with pure Knime nodes, could you try to read the Excel file in batches instead of reading the whole file?

Before even implementing this, can you first check if you can run the Excel Reader and read one of the files partially? If that works, then you can implement a loop to read a certain amount of lines at a time.

To read a certain amount of lines, you can use this option in the Excel Reader Settings:

The other settings to modify is to limit the amount of rows to preview. You can change that in the Advanced Settings of the Excel Reader:
image

If you are able to read 5000 lines, try to push it to 10000 lines. Try to see how high to can go. And then you can implement a loop to set rows from and to dynamically via flow variables:
image

Hi @bruno29a,

I’ve tried the approach above, but still getting error "ZIP entry size is too large or invalid
" & “Java heap space” . I’ve changed the amount of rows and limit data rows scanned from 10,000 to <= 5,000. It didn’t work even though i select to read single file.

Could you show me the way to install R?

Regards,
Lim

@chunhlim my quoted thread should contain all the necessary links. Maybe you can check them out. To install R you would need:

https://docs.knime.com/latest/r_installation_guide/index.html#_introduction

1 Like

@chunhlim you will have to check your proxy settings if you are allowed to access the update sites. Also you could just make a fresh installation under a new path and switch your workspace.

2 Likes

Hello @chunhlim,

Correct. Unfortunately same result.

How many memory you have available on your machine? (In general you don’t want to give all or close to max memory to KNIME as it can lead to memory issues).

Regarding updating KNIME there is no direct update path from 4.2.x to 4.4.2 (latest) KNIME version (meaning you can’t update from within KNIME Analytics Platform) so you’ll have to follow @mlauber71’s suggestion and do fresh installation.

Br,
Ivan

1 Like

Hi @chunhlim , I’m sorry to hear that the partial reading of the file did not help. I think at this point your best approach would be to use R.

I guess there is no way to upload a sample (randomized)? I guess all supporters here are curious to try for themselves
br

Thanks again. I’ll make a fresh installation and try the R node.

Hello @chunhlim,

before trying R you can give it a go with Excel Reader from 4.4 as it’s new and improved compared to one from 4.2.x

Br,
Ivan

2 Likes