I have Excel 97-2003 files and I need to read them to process them through some ETL loop but I can’t find the way to read them.
I tried Excel reader but it is not working because it gives me this error:
“format file … is not supported. Please select a valid XLSX, XLSM, XLSB or XLS file”
With File reader I receive this error:
“Execute failed: Illegal char <:> at index 4: file:…”
Is there any other possibility to read it?
Thanks for the example. I opened it first using Excel and it appears to be in an unusual Excel format called “XML Spreadsheet 2003 (*.xml)” instead of a usual XLS format. This is how it looks like when I open it with NotePad++:
I guess this is why KNIME doesn’t know how to handle it and hence it is displaying error message:
At a first glance, there are a few solutions:
Convert beforehand the files from “XML Spreadsheet 2003 (*.xml)” to XLSX format using a recent version of XLS so that KNIME can read them. This manual work is fine if you only have a few files to convert
Download from the internet and run a C# / Python / Java utility to bulk convert all your files from “XML Spreadsheet 2003 (*.xml)” to XLSX format.
If the format of your files is always the same as in the example:
then you could read the files on KNIME as XML files and then build a workflow to analyze the XML content and extract the information. This should not be difficult if your files -always- have the same format as shown before.
Please reach out if the 3rd solution is ok with you and you would need extra help to implement it.
Let’s say that every month the files are in the same format but the structure changes because I have files for many different plants that have different structures.
But basically, the file for the PLANT A has the same structure and the same format every month, PLANT B has the same structure and the same format every month, and so on.
So maybe the possibility of Python would be the best right?
Can you help me also with this solution, please?
@pnotova from what I see this file is a quite old special Excel Format from the times of Office XP. It can be opened using LibreOffice without a problem or error messages, and sometimes with error messages with a current version of Excel. I have tried several packages in R and Python but was not able to read the file properly. Question indeed would be if it can be loaded in a different way.
My pleasure. I have implemented a simple “Excel 97-2003” to modern XLSX file format converter workflow in KNIME and I’m currently documenting it. For further testing the workflow example, I would need at least an extra couple of different -Excel 97-2003- files, just to check that everything works fine.
Would you mind to upload these extra files here please?
Once I have tested the workflow on them, I will upload it here with explanations.
How the workflow works is explained node by node. Essentially, the Excel file is read as a XML file and the XML is interpreted to extract the worksheets and their names. Every worksheet is analysed independently to extract columns and cell content and eventually the newly created tables are stored back again in XLSX Excel format in the most probable column types.
As explained in the workflow description, it runs as follows:
This workflow shows how to read several simple Excel files in Excel 97-2003 (XML) format and how to convert them in current XLSX file format.
The Excel 97-2003 (XML) files are read from the workflow data folder and should be named with the extension *.xls