Separating Excel files (empty files from files with content)

Hello
I have a large number of Excel files in a folder that I want to use after ETL, but some of the Excel files only have one row. I want the system to automatically delete Excel files that have one row and send Excel files that have more than one row to the next stage to be ETLed.

before even loading, i would suggest to check if a file-size based solution would work.

given this, you could split files e.g. larger and smaller 1 MB.

afterwards, you can still load all files < 1MB to verify.

1 Like

It is not possible to accurately measure the file size because the size changes depending on the information inside the file, and this must be done by reading and examining the inside of the Excel file.

then you have 2 options:

if the structure within is identically, load all with 1 excel reader, include the path and filter relevant by e.g. a groupby with count

if the structure is different between each file, you will have to loop.

both ways are inefficient but Knime doesnt offer anything else built-in (you can always write custom python or java code).

in both cases, i would still consider pre-filtering the list of files using the size.

Try this. It uses a loop to identify worksheets with one row. These are filtered and the remaining files are read in a second loop and then processed. I have no idea what you want to do with the target worksheets so my workflow does a simple string manipulation as an example.

2 Likes

Thank you for your very good answer and I hope you succeed.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.