Excel consolidation (multiple excel files to one excel with multiple worksheets)

Hello,

Hope I’m not repeating existing topics.
The case is as following:

3 years of 12 monthly trial balances in excel (all files with the same format - .xlsx, in one folder)
objective: import all excel files from the folder into Knime (at once, by folder selection, not thorugh 36 excel readers) and create an Excel file with 3*12 (36) worksheets representing each of the imported Excel files.

Thank you very much.

George

Hi @GeorgeVP90,

if i understood you correctly you can do that with the load all files in folder mode from the Excel Reader node (at least for all files the same sheet)

To also get all sheets at once you could construct a loop where you first get all sheet names:

And afterwards loop/read the files :slight_smile:

Hi,
From what I see the Excel Reader combines all excel files from the folder into one data table.
What I need is basically to combine 36 excel files from a folder into one excel file with 36 worksheets.

Thanks

Hey there @GeorgeVP90 … interesting task.

I am not 100% sure if @AnotherFraudUser 's approach would work (no offense), because I think it reads only the first sheet but not all.

I have used a rather “old school” approach with nested loops and three main nodes:

List Files/Folders this reads all the file names in a specified location (in this example relative to my workflow but could be located anywhere). This is required to turn it into a flow variable (see below)

Read Excel Sheet Names : This extracts the sheet names for a specified file. This is also required to turn it into a flow variable (see next bullet point)

Table Row To Variable Loop Start This starts the first (outer) and second (inner) loop and turns the info into flow variables. These can no be assigned to an Excel Reader node, where the “path” (datatype: path) flow variable is the one that points to the file and the “sheet” (datatype: string) flow variable points to a sheet in that specific file.

Finally you need to include the Loop End node, to close both loops. Your data should be available at the output port of the second Loop End node.

My example workflow below points to a local folder but you can change this to your specific location in the List Files/Folders node.

Attention: I think you do not find the handy “V” button to assign the flow variables to the sheet names section in the Excel Reader node so you have to do this via the Flow Variables tab and then settings > sheet_name.

Let me know, if this helped :slight_smile:

1 Like

I would go with @kowisoft 's approach and use excel writer in the loop to append the file into your newly created excel file to get the 36 different tabs in one file
bR

2 Likes

hmmm… I didn’t read that requirement correctly, sorry @GeorgeVP90

When I do what @Daniel_Weikert recommends I get strange results. How do you connect and Loop End node to a node that has no output port like the Excel Writer?

I tried to run them on parallel, where on branch after the Excel reader goes to the Excel writer and another branch goes to the two Loop End nodes, but this gives me strange results.

It writes all table sheets correctly into an already existing spreadsheet file (using the append function in Excel writer) but the results are skewed… I only get the content of the very first sheet (sheet 1 from file1.xlsx) written…

Maybe someone else could jump in here?

Hi @GeorgeVP90 , I’ve taken a shot at it since the proposed solutions did not seem to work.

Some assumption: I was not sure if you had 1 excel file per month (so 1 file, 1 sheet), or 1 excel file per year (so 1 file, 12 sheets). I have prepared a workflow for the latter, that is the source is 3 files with 12 sheets each, as it is more complicated, and it would also work for the first case.

My workflow is successfully merging all the sheets from any excel files to multiple sheets in 1 excel file.

Some notes: In order to differentiate which sheets are from which files (in your example, all the files have the same sheet names), I have used this naming convention for the newly generated sheets: -, so hopefully your files are not named too long.

In my example, I have 3 files named 2018.xlsx, 2019.xlsx, and 2020.xlsx. Each have 12 tabs, named from 01, 02, …, 11, 12.

Here is how my workflow looks like:

Folder content:
image

File content (for 2018.xlsx, but same for all 3 files):

Results after merge (36 sheets, goes from 2018-01 to 2020-12). I am adding it as 2 screenshots (start and end), as it’s too wide to show all sheets:


Here are the workflow and sample files:
Excel consolidation (multiple excel files to one excel with multiple worksheets).knwf (27.7 KB) 2018.xlsx (14.0 KB)

Yes it seems to be working.
As per another recommendation on the Forum , there is a string manipulation additional node added to the workflow.

I’m somewhat of a noob with Knime, I hoped this business case would encounter something simpler than looping, but I will get more proficient as I work through.

Thank you very much,

Best regards,
George

Yes, the example you showed covers a more complex case.
Thank you very much.

Hello @GeorgeVP90,

glad to see you figured it out. However here is example workflow on KNIME Hub with new File Handling framework (no deprecated nodes):

(Note on your workflow: flow variables are available to downstream nodes regardless of connection type meaning you don’t need that much red lines :wink:)

And although this business case seems frequent I’m not sure it will be possible to ever do it without some kine of loop…

Br,
Ivan

2 Likes

Hi everyone. Please advise how it should looks like if I want to use consolidation of the same files which I update on a daily basis? How to overwrite file which already exist with the updated info from initial excel files?

In addition, in final file will be one sheet with calculation based on data from these consolidated sheets. This sheet should not be changed

Thanks!

Hello @meTan,

and welcome to KNIME Community!

Let’s see if I understand what are you trying to do. You have multiple Excel files which are daily updated and you want to process them where final output will be in another (final) Excel file? And in this final Excel file you want to add new information or overwrite file completely?

Probably some example would help :wink:

Br,
Ivan

1 Like

Hello @ipazin,

you understand correct, yes. Just my comment that info from multiple files should be consolidated as a sheets in a file which already exist and has additional calculation sheet (please see attached example).

Ideally calculation sheet shouldn’t be changed. Only info in the three report sheets plus two pivots should be overwritten.

Thank you and have a good day!

Excel reader allows you to append to an existing file but maybe in your case the way to go would be reading the file upfront only keep the calculation sheet, use a loop to get all the comments and then write the whole result in one excel output

Hello @meTan,

here are the (basic) steps your workflow should have:

  1. Read Excel reports in KNIME
  2. Process data from each report
  3. Use Excel Writer to write data into final Excel with option to overwrite sheets

If Excel file and sheet names are always same I would avoid loop and simply use one Excel Reader for each report and one Excel Writer for each sheet. Give it a try and if any questions feel free to ask :wink:

Good luck!

Br,
Ivan

2 Likes

Hello @ipazin,

First what I did - upgraded KNIME up to 4.3 (it was 4.1).

Excel Writer here much better and I did what I need.

Thank you! :slight_smile:

1 Like

Glad to hear that @meTan :wink:
Br,
Ivan

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.