Reading excel files with different sheet names in a folder

Hello, I just started learning KNIME today and I’ve run into an issue with the Excel Reader.

I have a folder with weekly data in excel files and they all have the same table structure. The only difference between all these files is the sheet name since the sheet name changes every time the data is exported from my source (which I have no control over).

Is there a way to get Excel Reader to read all the files without having to manually change the sheet name in every excel file so that they’re all the same? I know I wouldn’t have to worry about that if I had the files in another file format, but I’d like to avoid that since I have years worth of data. My goal is to have these files combined into a single file.

I’ve tried to look at other people’s questions to see if I can get some kind of answer to work with, but nothing seems to help.

Thank you,
Daniel Yi

1 Like

@daniel_yi welcome to the KNIME community.

You might want to check this example. Also on the hub there should be more that would suite your needs:

3 Likes

So I’ve tried using the Read Excel Sheet Names node and connecting it to the Table Row to Variable Loop Start node, but I’m still getting an issue in my Excel Reader node. The example you’re referring to is if the files have multiple sheets, but it seems like there is still an assumption that the sheet names are the same across multiple excel files. Here’s a picture of my workflow.

The error message I get is:
ERROR Excel Reader (XLS) 2:5 Execute failed: java.io.IOException: Workbook “2020 Week 2.xlsx” does not contain a sheet called “Sheet1”

It’s happening because the sheet names are different between the files.

Everything else in the workflow works fine (thanks to the hub which you pointed out for me).

1 Like

The workflow should be ok with that. Have you checked that the sheet names are set as a variable as well as the excel file itself. And would this error happen with an early file or in the middle (so to speak).

The you might check if your file actually does contain a sheet1

Do you know where I would set the sheet names and excel file as a variable? I don’t see an option for that in either of the excel nodes.

The error is happening early in the file (like, between the very first and and second files). The first file contains a Sheet1, but the rest of them don’t.

In the configurations for Excel Reader, I have “Files in folder” and “first sheet with data” selected.

Hi @daniel_yi

This workflow Reading excel files with different sheet names in a folder.knwf (30.1 KB) reads the filenames and the sheet name as a flow variable excel_different_sheet_names.zip (12.8 KB) . Then both are merged and used as an input for the Excel Reader. The Loop End nodes adds the information (maybe a Loop End Column Append node is a better solution for you?).


gr. Hans

5 Likes

Hey @HansS

I am still having issues with the Read Excel Sheet Names node. The output table shows that it is only reading the sheet name of the first excel file, but not the others. I noticed in your workflow that the file you have selected to read is “randomForest.xlsx”, but that’s not a file that I see in the files you’ve provided. Is that a file containing the sheet names?

1 Like

Hi @daniel_yi

The workflow in my previous post did not match the picture. I removed the RowFilter and the deprecated nodes. You can find the flow on KNIME Hub . But in the end this doesn’t alter the approach. And yes you see a link to some randomforest.xls, but that file will be override by a flow variable (the location of the file from the List Fiel node).

4 Likes

Thanks for being patient with me @HansS

I’ve checked through every configuration and made sure that they match yours, but Read Excel Sheet Names is still only reading the sheet name of the first file. I did notice that I get the following warning:

WARN Read Excel Sheet Names (XLS) 0:3 Specified file doesn’t exist (C:\Program Files\KNIME\missing)

But I have no clue why it’s showing up. Maybe that’s why the first excel file is the only one being read?

I changed then example to two nested loops and put it on the hub. Maybe you could check it out. And it also uses the KNIME protocol to reference the files so it should work on all platforms and computers.

Thanks @HansS for the inital effort.

4 Likes

I was able to figure out that the problem I had was actually with the Excel Reader node flow variable configurations. It’s working perfectly now!

Thank you @mlauber71 and @HansS for all your help! Glad to have a great community here to help me out!

4 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.