create a workflow to process xls files without having to configure reader every time

Hi, I’m trying to figure out the best way to analyze data in xls form. About once/twice a week, I receive data in .xls file format. I have created a work flow to process all of the data to get what I need but I still have to manually configure the excel reader every time. Is there a way to get around this? The spreadsheet structure is always the same so I’m looking for a way to “plug” the xls file in with out having to configure it every time.

Additionally the spreadsheet has 2 tables (on the same sheet). The set up I have right now is to read the file twice (2 excel reader nodes of the same xls file but configured differently), one to process one table and another to process the other table. Is this the best way to do it?

Hello @callamari,

I think the best way will be to initialize the workflow with a variable.
If you define a folder containing the file or directly the path of the file in your workflow variable, then you can just configure once and the workflow will be ready.
You won’t have to configure it again until the format change.

For example, on the following example, the 3rd example, you’ll see how to go through all the files in a folder and read them (with no further configuration): https://kni.me/w/2N5wMVun9Idx8XXO

For your 2nd question, you’ll have an example here: https://kni.me/w/a2XQGQ7J9u0N6IZt
It will work only if you have the same format. Otherwise I’ll recommend 2 Excel Reader.

Best
Jerome

1 Like

Thanks Jerome. I currently have the 2nd workflow set up, where it loops through files in the folder. But I don’t know how to have it read the file when added to the folder. Also we are only interested in processing the new spreadsheet as it becomes available.

Hello,

An example for your question can be found in this workflow: https://hub.knime.com/trj/spaces/Public/latest/ExcelReader_ListFiles

This solution will move the files you’ve read to another place. With this solution, when you run the WF again, only new files are read by the process.

Hope it helps. Let me know.

Best
Jérôme

3 Likes

Thanks Jérôme. This solution works great. The only issue I’m having is the setting of the Excel reader (XLS) node. If I configure it to “files in folder” and “local file system”, it gives me an error saying the tables in the files are different. But if I change it to “file” or “Custom URL” it will loop through all the files with no issue. But when a new file is added to the folder it won’t be able to read the new file as the file path is still saved on the old file that had been moved.

So if you configure with file and Local File System and you define the variable (in yellow below), it will go to all the files provided before by your “List Files”, and inside your Loop.

The thing is that you have to reset and start again your “List Files” node before the Loop. This node have to examine again the folder in order to create the new list of files.

Does it make sense ?

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.