I use Excel file as input data for the workflow assessing data quality.
Data quality was assessed with no issues on 5th July 2024 and KNIME 5.2.x version.
I updated KNIME to version 5.3 on 10th July 2024 and faced some issues while assessing data quality.
Finally, I have just replicated actions from 5th July (same input file, same workflow) in new KNIME version and still getting the following error message:
Execute failed: Tried to read data but the maximum length for this record type is 100,000,000. If the file is not corrupt and not large, please open an issue on bugzilla to request increasing the maximum allowable size for this record type. You can set a higher override value with IOUtils.setByteArrayMaxOverride()
This error is being generated for âExcel Writerâ node, however previous âExcel Writerâ node is being executed correctly. Output Excel file size is 12+MB.
in the past the same workflow used to work with bigger (more records, bigger file size) file as input and generated bigger Excel output file.
The error message inside Excel Writer node looks like this:
Any suggestions for how to identify root-cause? What âthis record typeâ does stand for? How to solve the issue? Is issue related to KNIME or any Excel limitations?
Unfortunatelly, due to confidentiality of data, I canât share these (input, output) Excel files.
Between Analytics Platform 5.2 and 5.3, we updated the underlying Excel library from Apache POI 4.1.2 to POI 5.2.5 (which was the newest release at the time).
The only thing I could find that changed in POI and that sounds relevant is that a DoS check was introduced that limits the number of records that can be allocated for XLS files in order to guard against maliciously crafted input files. XLSX files do not seem to be affected by the change.
I would appreciate it if you could try writing an XLSX file to see if the error message pops up there, too. This would narrow down where we would need to look to make your use-case work again.
In addition, could you provide me we a few more details about your output KNIME table and the Excel file? How many columns and rows are you writing? Are there any large cells, e.g. cells containing âdocumentsâ?
Thank you very much for your response and follow-up.
In short, the workflow does the following:
Various actions.
Writes XLSX file with the first worksheet and creates variable for the file path. This sheet contains less than 5000 records and 630 columns. The longest string (after converting numbers to strings) contains below 2000 characters (no links, no documents, no graphics - just alphanumerical characters). Excel Writer node works properly here in KNIME 5.3.
Various actions.
Writes the second worksheet, while the path is controlled by variable. This sheet contains 530 records and 18 columns. The longest string (after converting numbers to strings) contains 45 characters. Excel Writer node generates error message here in KNIME 5.3.
Various actions, including two next Excel Writer nodes adding two next sheets while the path is controlled by variable again. Of course, this part is not being executed.
Today, I have returned to KNIME 5.2.5 and the workflow is being executed properly there.
Please let me know if any more info is required.
Thank yuo again,
Kaz
The second write currently reads the earlier written contents, though I am not sure what exactly triggers the exception or if it is even needed to read the existing contents. So it seems more related to what was written in the first write.
Iâll have a closer look when working on the linked ticket.
Same error while trying to write to an existing Excel file which is containing 2 sheets( one is 47K rows * 31 columns and the other one is blank (1 row(headers) * 25 columns)
ERROR Excel Writer 3:891 Execute failed: Tried to read data but the maximum length for this record type is 100,000,000.
I was able to reproduce the issue. It is due to the way how the Excel Writer opens existing files â which is how the Excel Reader used to do it â and a newly imposed limit on in-memory data structures by Apache POI 5.
Until we have a fix and released it, you can try this somwhat hacky solution, to apply the workaround mentioned by Apache POI in the error message:
Java Edit Variable node (that does not actually edit a variable) with:
Additional Bundles: org.apache.poi and org.apache.poi.ooxml in version 5.2.5 (should appear in the search)
Thank you for your update and suggested work around.
In fact, I am using alternative work around which is having two separate KNIME instances. This means, I use KNIME 5.2.5 version for workflows in which error message occure and KNIME 5.3 version for other workflow. This adds some inefficiency on one hand, but âsolvesâ the issue on the other. So far, so good:)
Awaiting next release.
Same issue here, I opted to go back to 5.2.5 and works perfect⌠as learning, never work in production with versions ending in â0â, constantly have several bugsâŚ
hi @hotzm ,
I have encountered the same issue. Before running the Excel Writer node, I have to manually delete most of the original data in the sheet, and then I can successfully run the Excel Writer to write data.
And I also tried using the âremove all data before writingâ function in the âWrite to Excel Templateâ node, but the same error still occurred.
Hi @hotzm and everyone.
Iâm facing this issue as well.
I tried to export my joint data into a sheet in .xlsx without overwriting other sheets (not facing issue with .xls).
I am new to KNIME (less than a month) and I really love KNIME
itâs 2AM now that Iâve been searching for a solution and I luckily see this discussion.
However, I found myself struggling to understand this solution due to my limited foundation.
I would really really appreciate if you could explain it step by step of how to get to the location in the screenshot and edit the value to solve this problem.
Thank you
You have to open the configuration dialog for the Java Edit Variable node, go to the âAdditional Bundlesâ tab, include the two bundles as show in the screenshot, by searching for them in the lower part of the dialog and adding them with the button to the right.
Then, in the code part of the dialog (first tab), you paste the code I posted right below where it already says âEnter your code hereâ in your dialog.
Then you make sure that the Java Edit Variable node is executed at least once before the first appending Excel Writer node. It does not have to be executed right before each of them, once at the start of the workflow is enough (technically, once before the first appending Excel Writer node until AP is restarted, but itâs easier to do it once at the start of each workflow).