Excel Reader Multiple Files Fails

hello everyone.!
Hope you are safe and healthy.
I am trying to read multiple excel files with the Excel Reader Node and got this error.

The node setting is as follows

and an example of the multiple files is the next one.
Files.zip (108.4 KB)

Any help will be much appreciated.

Regards

Hi @hsrb

The Excel Reader requires that each worksheet has exactly the same structure.

The error you received says that your March 2020 Ofertas Registradas worksheet has a different structure to the January 2020 and February 2020 Ofertas Registradas worksheets.

If you investigate by importing all the relevant worksheets separately, you’ll see that the “Oferta ($)” column for January 2020 and February 2020 are being recognized as double floating point numbers, whereas in March 2020, it’s being recognized as an integer. This is the problem.

One solution is to use the following loop, where:

  1. Excel files in the relevant folder are listed
  2. File locations are converted to flow variables
  3. The relevant sheet from each Excel file is imported using the Location flow variable
  4. Sheets are concatenated as they are read
  5. The Loop End node is configured to allow variable column types

list loop

The Excel Reader node will complain that the “DataSpec generated by configure does not match spec after execution.” but it will still run.

7 Likes

Thank you very much @elsamuel for your prompt answer.!
One last question
Once the table is created, those columns that differ in type, need to be manipulated in order to set the correct type for each row, isnt it?
Regards

Hector

hello hsrb,
yes, it will depend on you depending on the type you want to have in each column. For example, for a date column which is of type string you could use the node string to dateTime to change the type of this column.

1 Like

Thank you @natte
Now I have run the solution given earlier and got another issue.
The time columns have no data in the consolidated excel table.
Take a look:

Regards

hi hsrb, it will suffice either in knime, choose as date type <> for the columns concerned and restart your workflow. oubien you will be able to directly modify the format of the cell in excel if and if the cell its format personalized date if not you will have to use always in excel in the toolbar, data, the tool <> to format your date .

You could use R to force all columns to first be imported as strings and convert them later after you have consolidated them.

You could also try and turn off the preview that would report the different types but you still might encounter problems later when bringing the data together.

3 Likes

Hi there all,

just to add info that there are tickets in system to:

  • make Excel Reader more flexible - have similar options as Loop End (Internal reference: AP-13883)
  • omit error “DataSpec generated…” while reading Excel files in a loop (Internal reference: AP-13882)

Have added +1 on them.

Br,
Ivan

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Hi all,

I just want to let you know that these things have been approached and fixed with the new Excel Reader in version 4.3. Thanks for the feedback!

Best,
Simon

3 Likes