Read Excel 97-2003 files

Hi,

I have Excel 97-2003 files and I need to read them to process them through some ETL loop but I can’t find the way to read them.
I tried Excel reader but it is not working because it gives me this error:
“format file … is not supported. Please select a valid XLSX, XLSM, XLSB or XLS file”
With File reader I receive this error:
“Execute failed: Illegal char <:> at index 4: file:…”
Is there any other possibility to read it?

Thanks in advance,

Petra

Hi @pnotova,

Could you please upload here one of your Excel 97-200 files if it is not confidential ? A file example not working for you would be perfect for us to help you.

Thanks & best regards,
Ael

2 Likes

Hi @aworker,

I have prepared an example of the file. Thanks for help.
EXAMPLE_FILE.xls (44.3 KB)

Hi @pnotova

Thanks for the example. I opened it first using Excel and it appears to be in an unusual Excel format called “XML Spreadsheet 2003 (*.xml)” instead of a usual XLS format. This is how it looks like when I open it with NotePad++:

I guess this is why KNIME doesn’t know how to handle it and hence it is displaying error message:

At a first glance, there are a few solutions:

  1. Convert beforehand the files from “XML Spreadsheet 2003 (*.xml)” to XLSX format using a recent version of XLS so that KNIME can read them. This manual work is fine if you only have a few files to convert :wink:
  2. Download from the internet and run a C# / Python / Java utility to bulk convert all your files from “XML Spreadsheet 2003 (*.xml)” to XLSX format.
  3. If the format of your files is always the same as in the example:

then you could read the files on KNIME as XML files and then build a workflow to analyze the XML content and extract the information. This should not be difficult if your files -always- have the same format as shown before.

Please reach out if the 3rd solution is ok with you and you would need extra help to implement it.

Best
Ael

3 Likes

HI Ael,

thanks for your really fast help!!!

Let’s say that every month the files are in the same format but the structure changes because I have files for many different plants that have different structures.
But basically, the file for the PLANT A has the same structure and the same format every month, PLANT B has the same structure and the same format every month, and so on.
So maybe the possibility of Python would be the best right?
Can you help me also with this solution, please?

1 Like

Hi @pnotova , moving forward, is there any reason why the monthly files can’t be saved as Excel > 2003?

2 Likes

@pnotova from what I see this file is a quite old special Excel Format from the times of Office XP. It can be opened using LibreOffice without a problem or error messages, and sometimes with error messages with a current version of Excel. I have tried several packages in R and Python but was not able to read the file properly. Question indeed would be if it can be loaded in a different way.

2 Likes

Hi Petra

My pleasure. I have implemented a simple “Excel 97-2003” to modern XLSX file format converter workflow in KNIME and I’m currently documenting it. For further testing the workflow example, I would need at least an extra couple of different -Excel 97-2003- files, just to check that everything works fine.

Would you mind to upload these extra files here please?

Once I have tested the workflow on them, I will upload it here with explanations.

Thanks & regards,
Ael

2 Likes

Hi Ael,

thank you for the help!
here are other files:
ex2.xls (214.5 KB)
ex3.xls (88.4 KB)
ex4.xls (90.8 KB)

thank you a lot!

Petra

Unfortunately, I have to deal with this format and for now, it is impossible to change…

Hi Petra,

Thanks for the last files. They were very useful to debug the workflow and make it much more robust.

The workflow looks as follows:

How the workflow works is explained node by node. Essentially, the Excel file is read as a XML file and the XML is interpreted to extract the worksheets and their names. Every worksheet is analysed independently to extract columns and cell content and eventually the newly created tables are stored back again in XLSX Excel format in the most probable column types.

As explained in the workflow description, it runs as follows:

This workflow shows how to read several simple Excel files in Excel 97-2003 (XML) format and how to convert them in current XLSX file format.

The Excel 97-2003 (XML) files are read from the workflow data folder and should be named with the extension *.xls

The converted XLSX Excel files are stored under the same workflow data folder and are renamed with extension *.xlsx

Please reach out if you need any explanations or extra help.

Hope it helps.

Best
Ael

4 Likes

Thank you really thanks!
I don’t know now if it is working I will check it out later but still THANK YOU this forum helps me every time and it is just amazing!

I let you know ASAP if it has worked!!

Thanks !!!

Petra

2 Likes

Hi @pnotova

Hope everything is going well. I’m just curious if the solution I posted worked for you. Any feedback would be more than welcome.

Thanks & regards,

Ael

1 Like