Error message in KNIME ver. 5.3

Hello KNIMErs,

I use Excel file as input data for the workflow assessing data quality.
Data quality was assessed with no issues on 5th July 2024 and KNIME 5.2.x version.
I updated KNIME to version 5.3 on 10th July 2024 and faced some issues while assessing data quality.
Finally, I have just replicated actions from 5th July (same input file, same workflow) in new KNIME version and still getting the following error message:

Execute failed: Tried to read data but the maximum length for this record type is 100,000,000.
If the file is not corrupt and not large, please open an issue on bugzilla to request
increasing the maximum allowable size for this record type.
You can set a higher override value with IOUtils.setByteArrayMaxOverride()

This error is being generated for ‘Excel Writer’ node, however previous ‘Excel Writer’ node is being executed correctly. Output Excel file size is 12+MB.
in the past the same workflow used to work with bigger (more records, bigger file size) file as input and generated bigger Excel output file.

The error message inside Excel Writer node looks like this:

Any suggestions for how to identify root-cause? What ‘this record type’ does stand for? How to solve the issue? Is issue related to KNIME or any Excel limitations?

Unfortunatelly, due to confidentiality of data, I can’t share these (input, output) Excel files.

Thank you for any support,
Kaz

Hi @Kazimierz,

thanks for reporting!

Between Analytics Platform 5.2 and 5.3, we updated the underlying Excel library from Apache POI 4.1.2 to POI 5.2.5 (which was the newest release at the time).
The only thing I could find that changed in POI and that sounds relevant is that a DoS check was introduced that limits the number of records that can be allocated for XLS files in order to guard against maliciously crafted input files. XLSX files do not seem to be affected by the change.

I would appreciate it if you could try writing an XLSX file to see if the error message pops up there, too. This would narrow down where we would need to look to make your use-case work again.

In addition, could you provide me we a few more details about your output KNIME table and the Excel file? How many columns and rows are you writing? Are there any large cells, e.g. cells containing “documents”?

The internal ticket is AP-22911.

Best,
Manuel

3 Likes

Hi @hotzm ,

Thank you very much for your response and follow-up.

In short, the workflow does the following:

  • Various actions.
  • Writes XLSX file with the first worksheet and creates variable for the file path. This sheet contains less than 5000 records and 630 columns. The longest string (after converting numbers to strings) contains below 2000 characters (no links, no documents, no graphics - just alphanumerical characters). Excel Writer node works properly here in KNIME 5.3.
  • Various actions.
  • Writes the second worksheet, while the path is controlled by variable. This sheet contains 530 records and 18 columns. The longest string (after converting numbers to strings) contains 45 characters. Excel Writer node generates error message here in KNIME 5.3.
  • Various actions, including two next Excel Writer nodes adding two next sheets while the path is controlled by variable again. Of course, this part is not being executed.

Today, I have returned to KNIME 5.2.5 and the workflow is being executed properly there.

Please let me know if any more info is required.
Thank yuo again,
Kaz

Thanks for the detailed description!

The second write currently reads the earlier written contents, though I am not sure what exactly triggers the exception or if it is even needed to read the existing contents. So it seems more related to what was written in the first write.

I’ll have a closer look when working on the linked ticket.

2 Likes

Thank you very much @hotzm

Same error while trying to write to an existing Excel file which is containing 2 sheets( one is 47K rows * 31 columns and the other one is blank (1 row(headers) * 25 columns)

ERROR Excel Writer 3:891 Execute failed: Tried to read data but the maximum length for this record type is 100,000,000.

Looking forward to any update.

Hi @zhaojjxvi, @Kazimierz,

I was able to reproduce the issue. It is due to the way how the Excel Writer opens existing files – which is how the Excel Reader used to do it – and a newly imposed limit on in-memory data structures by Apache POI 5.

Until we have a fix and released it, you can try this somwhat hacky solution, to apply the workaround mentioned by Apache POI in the error message:

Java Edit Variable node (that does not actually edit a variable) with:

// Enter your code here:
org.apache.poi.util.IOUtils.setByteArrayMaxOverride(Integer.MAX_VALUE);

If you want to be “nice” in your workflow, you can add a second Java Edit Variable node after the Excel Writer that sets the value back to -1.

It could look something like that:
Screenshot 2024-07-26 at 11.42.39

(the failure is now due to my very low heap setting, which is too little to fully read my test file and not because of the maximum length limit above)

Will update this thread when we have a proper fix available.

Best regards,
Manuel

3 Likes

Hi @hotzm

Thank you for your update and suggested work around.

In fact, I am using alternative work around which is having two separate KNIME instances. This means, I use KNIME 5.2.5 version for workflows in which error message occure and KNIME 5.3 version for other workflow. This adds some inefficiency on one hand, but ‘solves’ the issue on the other. So far, so good:)
Awaiting next release.

Have a good day,
Kaz

1 Like

Thank you @hotzm !

It saves my day!

Same issue here, I opted to go back to 5.2.5 and works perfect… as learning, never work in production with versions ending in “0”, constantly have several bugs…

hi @hotzm ,
I have encountered the same issue. Before running the Excel Writer node, I have to manually delete most of the original data in the sheet, and then I can successfully run the Excel Writer to write data.
And I also tried using the “remove all data before writing” function in the “Write to Excel Template” node, but the same error still occurred.

Hi @hotzm and everyone.
I’m facing this issue as well.
I tried to export my joint data into a sheet in .xlsx without overwriting other sheets (not facing issue with .xls).
I am new to KNIME (less than a month) and I really love KNIME
it’s 2AM now that I’ve been searching for a solution and I luckily see this discussion.
However, I found myself struggling to understand this solution due to my limited foundation.
I would really really appreciate if you could explain it step by step of how to get to the location in the screenshot and edit the value to solve this problem.
Thank you

Great to hear you enjoy using the software!

You have to open the configuration dialog for the Java Edit Variable node, go to the “Additional Bundles” tab, include the two bundles as show in the screenshot, by searching for them in the lower part of the dialog and adding them with the button to the right.

Then, in the code part of the dialog (first tab), you paste the code I posted right below where it already says “Enter your code here” in your dialog.

Then you make sure that the Java Edit Variable node is executed at least once before the first appending Excel Writer node. It does not have to be executed right before each of them, once at the start of the workflow is enough (technically, once before the first appending Excel Writer node until AP is restarted, but it’s easier to do it once at the start of each workflow).

I have uploaded a demo workflow to Community Hub which you can download and from which you can copy&paste into your workflow: Forum-81076_ExcelWriterArrayMaxOverride – KNIME Community Hub

If you have any questions, just ask :slight_smile:

4 Likes

Thank you @hotzm for sharing the workaround.

I keep this topic open to:

  • Confirm the issue in ver. 5.3 still awaits for regular solution.
  • Enable other KNIMErs to inform KNIME TEAM about facing the same issue.
1 Like

It works now.
Thank you so much.

I hope I can give you a hug now ^^

The fix is now live with AP 5.3.2 released :slight_smile:

4 Likes

Thank you @hotzm for letting us know. I will test that as soon as back in ‘working’ mode :slight_smile: :beach_umbrella:

1 Like

I can confirm the issue has disappeared after updating KNIME to 5.3.2 version.
Thank you @hotzm for your support.

1 Like

Maybe I am a little bit late to the party but for me the cause was a rather ambiguous formula like so:

BAD: =FILTER('Sheet1'!A:B;'Sheet1'!A:A<>"")
OK: =FILTER('Sheet1'!A1:B1000;'Sheet1'!A1:A1000<>"")

For more details see:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.