I would like to loop through an excel file and only append tabs that contain the word “gas” or “electric”, case insensitive.
Here is the current workflow and the sample data file. In this case, only 1 of the tabs has data, but that may not always be the case. I would like to record the sheet each data record came from as well so I can use that information downstream. KNIME_sheet_loop.knwf (1.1 KB) wgl_data.xlsx (39.0 KB)
Hi @Shmelky , I’m not sure I understand “loop through an excel file and only append tabs that contain the word “gas” or “electric”, case insensitive”. Why would you want to append these tabs to that file if they are already in the file?
I could assume that you are talking about reading an excel file and writing to a new excel file here, but I don’t want to spend time on a solution based on an assumption that could be wrong.
I want to collect the data from multiple tabs and extract them into one knime table so I can apply manipulations/analytics on that singular data table. I will write to an excel file at the very end.
After filtering on “gas” or “electric” case insensitive, I end up with these sheets only:
Then I just loop through each line to read the sheet from Sheet from the file in Path.
As per my suggestion in the workflow, you can do all your operations and even write to a new Excel file to different sheets per iteration within the loop.
If you want to read everything and process only after the loop, then you have to rely on the iteration values to know which data came from which sheet.
Do you know why the data repeats though? I would expect it to loop through each tab once, so with the attached data, there should be only 3 records of data (4 with the header). Why does the loop repeat the same data 4 times?
Hi @Shmelky , it looks like you have to choose “Select sheet with name” option:
I thought that assigning a variable to it would automatically select it, but that is not the case. You have to choose the option, but the name is dynamic using the sheet variable.
However, there is an issue with your data in that the headers are not on the first line, and your first line is not actually an empty line:
Because you want to read everything at once, at then process outside of the loop, the loop will try to concatenate all the sheets, but because of the structure I just showed you, that means that the structure is different from sheet to sheet.
The only way to allow the loop to process is that you do not use the headers from line 2, and keep the default headers of A, B, C, D, etc…
Of course, this will give you unwanted results.
So, options are:
Get rid of that first line so that the headers currently on line 2 are on line 1
or
Do not use “Table contains column names in row number”:
or
If the Headers are always in Row 2, then you can use this:
or
Process everything within the loop and write to Excel, in which case you would end the loop with a Variable Loop End, and the different structures will not matter
And then skip the first line that contains the sums (the sums in the 2 columns exist in all the sheets, and is producing 0 as value in the empty sheets) with a Row filter, and the workflow looks like this: