Reading multiply Excel Files with several sheets

DmitryIvanov76 · May 29, 2021, 11:41am

Dear friends, once again I am asking for help.

I have the following task:

There are several excel files in a folder.
Each file has several sheets.
Some sheets have similar structures, some are unique

I need to

Read files and sheets (done with Read Excel Sheet Names + Table Row to Variable Loop + Excel Reader)
BUT I can’t store multiple tables with different structures within a loop. Likely there’s a possibility to add all columns from every sheet of every table and then somehow separate them, but is it the only possible (and in fact not easy) solution…

Thank you very much for your help!

mlauber71 · May 29, 2021, 12:30pm

@DmitryIvanov76 you might want to explore these various examples of workflows to import Excel files in a lot of scenarios.

If this is not what you are looking for one way could be to create a sample workflow that would represent your challenges without giving away sensitive data.

takbb · May 29, 2021, 5:16pm

Hi @DmitryIvanov76, apart from reading a number of excel files, is there anything specific you are wanting to do with them?

I’m guessing they all have “something” in common rather than just being a random set of spreadsheets…

You are right that if they are all different you can’t just load them into one table format, but it wouldn’t make sense to do so either. On the other hand there are certainly examples of workflows that load all sheets from several xlsx files (each in a different format) using loops, for the purposes of compiling them all into a single new xlsx with multiple sheets…

So it might be that the thing they have in common, along with your purpose for loading them helps to determine the best “how”…

DmitryIvanov76 · May 29, 2021, 6:43pm

Dear takbb,

Thank you very much for your attention – all your solutions work perfect in my workflows and especially the one with HTML files you made by CSV writer. I have a little adjusted it to my needs and now I can make rather good-looking reports in a second

This time I’m dealing with a very common task more or less related to corporate reporting.

These xlsx(s) are something like tax files and every file has sheets with common columns, but it may appear that in one file the name of the column has a misprint, and the entire scheme fails.

What I have in mind and what I had luck to achieve?

First, using common approach I got the list of all columns in all sheets of all files. In total I got something about 400 columns. Sure, saved all sheets as new xlsx(s) files.

Then, I do transpose, duplicate row filter to have a table with unique names of columns matching columns in all files. 201 total.

When I got to this point, I was sure that the task is almost solved – I made a new loop reading all xlsx files with single sheets.

I believed that If I have an empty table with the entire collection of all possible columns, I may easily combine this “Test” table and tables Excel reader brings me in the loop. Then I may have luck working with this combined table because there are unique identifies (somewhere names, somewhere addressees, somewhere tax IDs – in general we had in mind that one day we may need to aggregate this data, but it was years ago and we thought about human based, not IT approach to this task).

By now I stuck with combining this test table and tables I read in a loop

takbb · May 29, 2021, 9:26pm

Hi @DmitryIvanov76, I’m pleased to hear that some past solutions are working for you!

I think I’m understanding your current problem, but I have a few questions.

if your sheets were currently perfect, (with no human error /misprints in the column names) , would your current solution work, or would there still be problems?

for example:

1a. Would the columns be in different order in different sheets?

1b. Do some sheets have additional / fewer columns?

If you were doing this manually, without the IT solution, what needs to happen to the different sheets to make this work?

2a. Is it simply that you need to make all the column headings the same?

2b. Is there anything else that needs to happen?

My initial thoughts are that you need to standardise your spreadsheets as the first goal. Maybe by using your 201 column names, manually create a lookup table (old name → standard name) for the purposes of making your column names the same on each sheet.

Each sheet would need to be run through this process and saved with new column names perhaps with column names sorted initially in alphabetical order, or some other defined sequence. If there are any columns that aren’t present on all sheets, you then either add them where they are missing, or remove them if they aren’t required. The aim is that at the end of this process, every sheet is saved in a standard format. After that you can then process them all and amalgamate them in a loop.

This is me just “thinking out loud” at the moment. I don’t know if that is along the right lines for what you need, and at this stage I haven’t yet thought through how you go about doing some of this, but I’m trying to test my understanding.

Daniel_Weikert · May 31, 2021, 4:40pm

A sample explains more than a thousand words

DmitryIvanov76 · June 1, 2021, 10:59am

Dear takbb,

Sorry for delayed reply.

I have successfully made a single table with all fields and now thinking about processing this huge table.

What gives hope that it appeared that every row can be attributed to the entity it belongs to (some cells contain unique identifiers, like IDs, names, etc. but unfortunately there are variations – these IDs may appear in different cells sometimes unpredictably). So, I am thinking about something like indexing &/or several Rule Engines to attribute each row to the entity and type of data (like accounts, reports, assets, etc).

I suppose that in this case I should first make a test table with all entities and their unique identifiers – thinking about the way to extract them from the table…

Not sure if I can succeed in this quest, but I must try hard first before asking for your help.

Have a great day and thank you for your questions – they help me a lot in planning next steps!

system · November 30, 2021, 10:59pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.