Read Excel Tables

Hello!

how can I read tables from Excel files? I mean formatted tables (the usual CTRL+T).
I have a bunch of Excel files, from which I should read a single table, which has the same exact name in all files.
Is there a connector that allows that?

Thanks!

Hello @cmor,

welcome to KNIME Community!

To my knowledge this is not possible with KNIME native nodes. Probably Python can be used as a workaround. See this topic for example on how to write Table to Excel:

Br,
Ivan

1 Like

We had a recent discussion about this. Here is an example where you would identify tables by a keyword and then import them. You might have adapt that to your current format and see if there are regularities in the data that you can identify and use.

@ipazin is refering to the possibility of creating Excel-Tables with the help of Python.

1 Like

What about reading the whole sheet and remove missing rows and columns?

Thank you all for your replies!
@ipazin Thank you! I would hope for a quicker solution for such a simple task though.
@Daniel_Weikert and @mlauber71 I think this is the right direction! I cannot remove the empty rows below, because they’re not empty (meaning I have a table above and another table below).
However, the table I need occupies the same cell range in all files, always in the first sheet.

I can definitely exploit this! I guess I must just learn how to store the first sheet name as a variable to be able to read from the first sheet no matter the name (the first sheet’s name is not always the same in all files).

It wouldn’t be bad to be able to exploit the fact that tables are named ranges though…that would make everything way faster!

@cmor I am notexactly sure what you mean by ‘simple task’ - if you know where your table is and know the line of the head you could just tell the Excel Reader what to read

If you want to identify the formatted tables you might have to resort to Python


If you want to explore further options how to deal with Excel (import) I have a collection for that prepared. The identification of formatted tables would be a new feature for sure :slight_smile:

And then if you could provide us with examples that cover your whole challenge one might be able to further explore possible solutions.

Thank you very much for your quick and detailed reply!

I’m indeed using the Excel reader in the end :+1:t2:

I’ll try to provide more detail:

I need to read several excel files from a folder. (Excel reader is fine here)
I need to read the same (one-row + header) table from each file, irrespective in which sheet it is (here is Excel reader no longer enough, because it doesn’t allow me to identify a named range)

Since the table I need is always in the first sheet and always occupies the same range of cells, I can use the Excel reader, choosing the first sheet option.
(Note that I’m just lucky here, because technically the table could be in any other sheet after the first one)

All I need to do now is learn how to store the first sheet name as a variable, so the loop doesn’t fail as soon as it finds an Excel file that has a different name for the first sheet.

1 Like

Hello @cmor,

Don’t understand this cause Excel Reader will not fail if configured to always read first sheet.

Also seems to me you can use option Read files in folder in order to avoid loop. (It will work only if you have same structure in each file)

Br,
Ivan

Hello @ipazin!
I though so too and that was exactly my first try. However, it throws an error as soon as it gets to the second file, saying it cannot find a sheet named like the first sheet of the first file :frowning:

Note that the table I need occupies the same range of cells, but not always in a sheet with the same name.

That’s why I thought storing the sheet names may solve the issue…

You could list the sheets of all excel files and proceed only those that match criteria

In this example also all sheets are listed and you can adapt how the files should be processed

1 Like

the read files option also does not work if the datatype for a column is different. So I think the loop is still the better option

Hello @cmor,

ok. I see now. As @mlauber71 pointed out you can use Read Excel Sheet Names (XLS) node to get sheet names and then Row Filter to leave only first row which has the name of first sheet. Use Table Row to Variable node and control sheet name with newly created flow variable.

@Daniel_Weikert you are right, currently to use that option one must be sure that data structure is the same.

Br,
Ivan

1 Like