I am experiencing problems reading an XLSX file created via Excel Writer Node via an interface (that likely tries to read the file in XML-schema).
I discovered that the problem gets resolved if I manually open and save the file in Excel prior to starting the import via interface.
At the same time I see that the file size increases significantly between the initially create file (Excel Writer Node) and the filesize after opening and saving via Excel.
I do not get an error message when executing the workflow (I am runnint version 4.7.). I also do not get any error or prompt when opening the created file in Excel.
I fixed the problem on my local machine by running a .bat file that would open, save and close the created file - however the .bat file cannot be executed on the target system.
Hence I need to get KNIME to writing the XLSX file in the “right” (i.E. machine-readable) format.
Any ideas how to fix this - a solution would be highly appreciated!
I compared the XLSX files generated by the Excel Writer and by Excel Writer → SaveAs in Excel (using Data Generator->Number to String as sample data). I used unzip to extract the XLSX to a folder and a diff viewer on the contents.
The only substantial difference I can see (apart from some XML attributes being reshuffled, different, etc, and the Excel default theme) is that the Excel Writer does not write a “shared strings table” (in order to optimize for low memory consumption). Maybe the missing shared strings table (more specifically it is empty, if you look at the XML file on disk) is the problem for your interface?
I would be glad if you could test that theory for me. You would hopefully get a suitable workaround and I can see what we would need to fix in an upcoming release.
Fortunately, it turns out that we can coerce the Excel Writer into writing this shared strings table for us .
All we need to do is to get the Excel Writer to “append” to the file and “Evaluate formulas”: then POI’s XSSFWorkbook is not used, which does not use shared-strings-table by default to conserve memory, and instead XSSFWorkbook is used, which uses shared-strings-table by default.
I got it working (and confirmed the presence of the populated shared strings table) with:
Excel Writer writes an empty table to the XLSX with the default settings. You can probably also create an empty XLSX in Excel and use that in step 2, but I wanted to test with a KNIME-only solution to make sure we don’t have Excel fixing up anything.