Parallel-process multiple Excel files by sheet name in a single KNIME workflow

san_98 · October 28, 2024, 5:38am

Hi

I have multiple Excel files in a folder and want to process them in parallel in a single KNIME workflow. Each file should be processed based on its sheet names not file names: if a file has a specific sheet name, it should follow a particular workflow path. I want to have each file and its matching sheet name run concurrently. Please let me know starting few nodes for this.

mlauber71 · October 28, 2024, 7:04am

@san_98 from what I recall we had a similar thread about parallel processes

Maybe you can combine it with these information

san_98 · October 28, 2024, 10:48am

Hi @mlauber71 ,

Thanks for the quick response.

The issue is that using the “Read Excel Sheet Name” function on multiple large files takes over 15 minutes just to retrieve the sheet names. Is there a way to add a condition node that checks the sheet name, and if it matches a specified name, proceeds to read only that specific sheet?

mlauber71 · October 28, 2024, 12:24pm

@san_98 this sounds strange. Have you tried listing all Excel Files and all sheet names like this:

and then you can employ Row Filters to the sheet names.

Next option would be to employ Python Script:

san_98 · October 28, 2024, 12:39pm

Hi @mlauber71 ,

I’ve attempted the same approach and to list all files. However, the challenge is that with a large number of high-volume files, the process becomes time-consuming. We’re working on a bulk data upload and aim to complete the process quickly. I’ve attached a workflow image for reference. Could you assist with optimizing this?

mlauber71 · October 28, 2024, 12:50pm

@san_98 we had a similar discussion before. Not sure if this is a bot thing? My suggestion: you try to plan out what it is you want to do - which data do you want to transfer where? For transfer zipping might be and option, also to do this in chunks, maybe.

san_98 · October 30, 2024, 4:17am

Hello @mlauber71 ,

I have multiple Excel files with large datasets, each containing a maximum of 2 sheets. I want to read all these files and then read the sheet names within each Excel file. The data from each sheet will be transferred to a designated SQL Server table, with a different table assigned to each sheet. For instance, if there are three Excel files in the folder, the data from the first Excel file will be loaded into Table1, the second file’s data into Table2, and the third file’s data into Table3. This process should run in parallel for efficiency.

system · November 6, 2024, 4:17am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.