Ignore empty files when reading files in folder (csv node)

PLS_KN · November 17, 2021, 9:20am

Hi everyone,

Do you know if there is a way to ignore empty files when reading csv files from folder?

Basically, every month I receive a batch of data reports in a zip file, I use a “decompress file” node to unzip. I only need the csv files, so I set the csv reader node with configuration “files in folder” and filter for file extension .csv.

This works fine as long as all the csv files have the same structure and are not empty. But the csv are automated reports, some might csv’s be completely empty (not even contain headers). When 1 or more csv are empty, the read csv node fails, so I have to manually go to the folder and delete the files. Is there any way I can automate that? perhaps there is a way to “ignore” empty files, or directly delete empty files from Knime?

ipazin · November 17, 2021, 10:02am

Hello @PLS_KN,

there is no ignore empty files option. However if you uncheck Has column header and Fail if specs differ options then you’ll be able to read data into KNIME without node failing. This way your column headers will be part of data but guess it should be easy to filter those out and make them headers in KNIME table. Especially if they are always same.

Br,
Ivan

PLS_KN · November 17, 2021, 10:45am

Hi Ivan

Thanks for the tip! I can read the all the data with this method, but I don’t get headers anymore. One way would be to manually change each field name with the column rename node. But is there a way to do it with a table creator node? For example the first column in my table can be the names given by the csv reader node (column0 to column N) and the second column is the column names I need, and then basically replace the initial name with the name that matches in the csv table?

Thanks!

ipazin · November 17, 2021, 1:57pm

Hello @PLS_KN,

Of course. There is Insert Column Header node once you have your mapping as you described it in Table Creator node.

Br,
Ivan

SimonS · November 17, 2021, 2:29pm

Instead of using a Table Creator and inserting the column names manually, you can also use a second CSV Reader that only reads the header of the first file.

Best, Simon

PLS_KN · November 17, 2021, 2:47pm

thanks a lot! will try that

PLS_KN · November 17, 2021, 2:48pm

thanks for the tip, will try that too

Mark_Ortmann · November 18, 2021, 7:26am

What about the files meta info node and remove/filter those entries whose size refers to an empty file?

PLS_KN · November 18, 2021, 8:01am

Hi Mark,

I had a look at file/folder meta info node. It outputs the size of the file but it still remains unclear how I can use this node’s output to then read files in folder & exclude file(s) with size 0.

How would you do that?

Thanks!

Mark_Ortmann · November 18, 2021, 8:31am

Doesn’t the node list all the files you want to read (sorry cannot check myself what the model’s output looks like right now) with their respective size? If it does add a filter to remove size 0 files and the use a loop to read every single file.

Best
Mark

PLS_KN · November 18, 2021, 11:09am

Hi Mark,
yes the node lists all the files and file metadata like size. I could do it like that but I try to avoid loops, and simply read files in folder. But thanks anyways

bruno29a · November 18, 2021, 10:23pm

Hi @PLS_KN , somehow I missed that post.

I went over everything that’s been said, and it looks like everyone is focusing on “ignore empty files when reading files”, which is fair since that’s what the title says.

But if I read your use case, your process seems to have 2 parts:

Decompress the csv files from a zip file into a folder
Read all the files via the csv reader, and preferably without loop, meaning you would read using the “Files in folder” option.

As pointed out, it does not seem to be an option to ignore empty files when reading, meaning you might need to use a loop, which you want to avoid.

In that case, how about doing the “clean up” in the first process? After doing the decompress, check for empty files and either delete them, or move them away, or rename them to something that you can filter out when reading the files.

There will be no need for Loops to do so. Just get the meta info of all the decompressed files using the Files/Folders Meta Info, and take action based on the fact that file size is 0.

EDIT: Something that would look like this:

PLS_KN · November 19, 2021, 9:04am

Yes actually it’s like you say. I need to 1) decompress files 2) ignore the empty ones. I tried your method but get the following error at the copy/move node

I tried a slightly different way using creating a variable and then deleting the file that matched that name, but I am not sure it will work in cases where I have multiple empty files.

Thanks!

Mark_Ortmann · November 19, 2021, 10:22am

Use the Transfer files node instead.

bruno29a · November 19, 2021, 2:36pm

Hi @PLS_KN , you just need to convert the Path to URI, that’s all.

@Mark_Ortmann , the reason I opted to use the Copy/Move Files instead of the Transfer Files node is that I can do multiple Copy/Move in one execution of a node, while Transfer Files does not - we’d need a loop with Transfer Files

Mark_Ortmann · November 19, 2021, 2:53pm

Can’t you just add the source input port and then select the column specifying the files to copy?

Best
Mark

bruno29a · November 19, 2021, 2:55pm

Hi @Mark_Ortmann , I’m not sure you can specify a column in the Transfer Files node, at least I did not find a way, that is why I went with the Copy/Move Files node.

gonhaddock · November 19, 2021, 3:04pm

Hello @PLS_KN
Cannot it be handle with an ‘Empty Table Switch’ ? Then just ignore that these files are there.

I mean, If you read them in a Loop. Combined with the @SimonS suggest…

you can also use a second CSV Reader that only reads the header of the first file

BR

bruno29a · November 19, 2021, 7:32pm

Hi @PLS_KN , sorry was busy earlier.

I put something together quickly for you. A side note to @Mark_Ortmann , there’s actually a Transfer Files (Table) version that can read the source from the table, just like the Copy/Move Files node did.

The workflow looks like this:

Details as follows:
For the demo, I created 6 csv files with the naming of file_1.csv up to file_6.csv, with file_2.csv, file_3.csv and file_5.csv as empty files and zipped them into a file called files.zip:

And the file_1.csv has:

id,name
1,test1

file_4.csv has

id,name
4,test4

file_6.csv has

id,name
id,test6

In my example, my zip file is in my data folder (“Relative to” option), and I want to decompress the files into a sub folder in my data folder called “Decompressed Files”:

You can also check addition options such as creating missing folders, and to overwrite if exists.

After running the Decompress node, I get these files in the Decompressed Files subfolder:

The Files/Folders Meta Info produces this:

I filter only on those whose size is 0:

And I get the expected 3 files:

I configured the Transfer Files (Table) node as follows:

So, basically I want to move the 3 empty files to a subfolder in my data folder called “Empty Files”, and I want to make sure that I check the “Delete source files / folders” option, that way it will do a “move” (copy and delete)

After running the node, my Decompressed Files subfolder no longer has the empty files:

And the empty files are now in my Empty Files subfolder:

So, at this point, the CSV Reader will only find non-empty files in the Decompressed Files subfolder:

We can see from the configuration that it’s seeing 3 csv files, and we can already see from the Preview that it’s reading 3 csv files.

And this is what I get from the CSV Reader:

All was achieved without any Loop.

Here’s the workflow (zipped file included): Ignore empty files when reading from csv reader.knwf (15.0 KB)

system · May 21, 2022, 7:33am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.