How to read multiple excel files in one folder?

I tried this earlier also but it didn’t work, I’ll again explain my task, I’ve some files in my pc(they can be of any format excel or csv) and I need to import them into knime environment at one go instead of using Excel Reader(XLS) node again and again. I’m attaching 3 files here, please help me with the workflow that I can use. For the reference I’m taking example of 3 files but there can be hundred files since I’m very new to KNIME and wants to explore this as much I can.Data1.xlsx (15.3 KB)
Data2.xlsx (16.1 KB)
Data3.xlsx (15.8 KB)

You could take a look at this workflow. Not sure if there is a way to load Excel files in one step without loop. Only technique I could think of is an external Hive table referring to a folder of CSV files (have not yet tried that with KNIME) but that seems not to work with KNIME.

And another note. There seem to be some customer informations and names in the sheets. Maybe you want to check if you are 100% comfortable with sharing them.

kn_example_multiple_excel_sheets4.knar (136.3 KB)

2 Likes

Hi, how can we apply fuzzy in knime as the same fuzzy that we apply in SQL, like if I want to check the similarity between name or address or telephone numbers.

For eg: We have address column in two different files(one is employee_master and second is vendor_master) and I want to check the similarity % between these columns(in any)

First I think it would be best if you have a new question to open a new topic, makes it easier for people to find relevant information.

Then as has been mentioned here there seems to be just the workflow you need to compare Adresses. Maybe you could adapt that:

https://www.knime.com/knime-applications/address-deduplication

Also if you just want to calculate the similarity of strings there is this thread:

1 Like

Hi, I,m trying to import multiple excel files in knime using one workflow but getting error in one of the node, ii would be great if anyone can help me about the thing that I’m missing here.
I’ll attach all the files that I want to import along-with the picture of the workflow with the error.

Hi there!

What error are you getting? Do you manage te read at least one Excel file or you get error in first iteration?

Br,
Ivan

Hi Swatityagi,

can you try uncheck the first row header option while configuring the node.

Regards,
Pavan.

Hi together,

Thanks for the super workflows, it helped me a lot. Can you help me once more please to solve following further problem: I have one folder with 2 Excel files each Excel file has 5 Tables. Now I would like to read these 10 Tables in 10 seperate Nodes (Excel Reader). With this wf I would like to skip the manual selection of each table…
Is that possible?

Best wishes, Bili

Look at
@mlauber71 example here

3 Likes

Hello izaychik,

Thanks for the wf, I tried it already, my tables are all different therefore it is not working (Execute failed: Input table’s structure differs from reference (first iteration) table: different column counts…), further i would like to have the results each in a separate node for further processing. Any ideas??

Regards
Bili

The issue with the different columns is no issue, but how to route or safe each table in a separate sheet I cannot find.

You could have a look at this post it uses R package to import CSV files with very different structures. You would still have to decide what to do with them later. I think this example would be adapted to your needs with readxl package.

And also it might help if you could provide an example most closely representing your problem (especially possible quirks that a solution must be able to overcome).

1 Like

Hello mlauber,

Thanks for the very quick response. My issue is, I have in one folder 2 excel files (a and b). These excel files contain several tables, 4 and 6 in total 10 tables. Now I want these uploaded automatically, instead of select them everytime time manually. The wf’s in this forum run through all excel files and all tables, but i cannot safe them somehow in seperate nodes. In my case I would like to use the wf but have at the end 10 Nodes as result (a1, a2, a3, a4, b1, b2, b3, b4, b5 ,b6) to process further.

OK so you want a workflow that creates import nodes that you then have within KNIME to further use them. Hmm spontaneously I think this is not possible in this way. I do not think KNIME has some meta script language (like SPSS Clementine had in those days). One could think about some freak experiment creating XML files representing nodes - but I would not go there.

If you have different structures of you data I would recommend to somehow loop thru them and store them in (separate) files. If the Excel reader can not do that one option would be to use R or Python scripts that might be more robust in that regard.

But still you would have to decide what to do with the data. Instead of having the data in various excel files you would then have them within KNIME.

Of course if you know what to do with them you could collect them all in one folder and then again loop thru them and do something with them.

If the structure is divers but would not change that much you might be better off just going thru the effort and creating the nodes by hand.

1 Like

Is there really a very complex way needed? The uploaded wf’s are able to run through the files and have them in one table. Here I could filter and would have a solution, but this one would be not flexible. Thats why I search for a way to use the wf, and then when one table is found safe it in a separate excel reader for example. I tried it with switchs, but did not work…

Maybe I still do not understand what you want to do. An example might help explain what your idea is. Of course, you can also save the table you ran thru in separate tables. You might use the name of the file or generate a new one from the number of the iteration.

It could be better change the mental model from Excel centric to database centric. Load your tables to database and read them from DB to Excel.

1 Like

Thank you very much for your help!! I try to explain.
I have one folder with 2 excel files (a and b). Each excel file has 3 Spreadsheets (1,2,3). I want to import them in KNIME with one click (instead of selecting and importing 6 times).
The result should be 6 separate imported tables (a1, a2, a3, b1, b2, b3).

A post was split to a new topic: Multiple Excel changed columns