I am reading list of excel files in knime using excel reader how ever there is one corrupted excel file in the path which can’t be read is there way to ignore that file using knime
Also I want to know the corrupted file name
I have almost 567893 excel files I can’t remove each corrupted file manualy
Hi @Ashok121 , that’s certainly a lot of Excel files to be processing!
You could perform this by wrapping the Excel Reader with try-catch nodes and then handling any errors by then transferring files that wouldn’t open to a separate folder.
Here’s a demo workflow
This may take a while to run with 500k files
This workflow was written using KNIME 5.1, but should work equally with KNIME 4.7.
The NoOp node is available from NodePit, and does nothing except provide a means of doing what the branch needs to do!
If you can’t/don’t want to install that, replace it with an Add Empty Rows node, with the config “Additional” set to 0, or you can use my “pass through component” which I wrote before discovering “NoOp”
Awesome Thank you for the soultion @takbb
is there a way so that all excel files can be consolidate in single excel reader in the same workflow
Assuming that all of your Excel files to be consolidated are of the same format…
If you want to include the error handling as in the demo workflow, then you can consolidate the files by replacing the variable loop end node with a Loop End node:
The output from that would be the consolidation (concatenation of rows) from all the excel files that have been read.
Alternatively, if you are able to first remove all “problem” files using the demo workflow, then you can simply read all files from a folder (and potentially subfolders) , without all the additional nodes/loops in the demo workflow by configuring a single Excel Reader, and the output would be concatenation of all of them.
Again, it assumes that all files contain the same table format.
is it possible to do similataneous remove corrputed file & Consolidating all Excel files ?
so that it will reduce the time & cause again it needs to read all files from first ?
Yes, by using the loop with the Loop End node, as long as the excel files are the same format this should work.
Can you share the workflow?
Hi @Ashok121 , I’m not at my pc now, but it was the workflow that I shared on the hub before; only you should just need to change the last node from this
The output from that loop end node would then contain all the rows read from the excel files in the loop, provided they are all the same table format.
i Tried the Workflow i mean the second one with LOOP END
getting Error as Execute failed: Input table’s structure differs from reference (first iteration) table: different column counts 5 vs. 16
i tested this on small folder in which there is 10 corrupted files however it moved all corrupted files but loop end i am getting error
is it because of Corrupted File?
i checked all Excel files the format is same for all including column names.
Hi @Ashok121 , it does sound like at least one of your Excel files has a different structure to the others. It could be as simple as a column being in a different case, or additional spaces included in a column name. It isn’t necessarily “corrupt” in the sense that it cannot be opened, simply that it differs from the others. This can often happen if the files have been created manually.
You can try configuring the Loop End node and tick “Allow changing table specifications” which will then cause it to adapt the columns to include all different table formats received from the various excel files.
In the “advanced” tab of the Excel Reader, you could also tick the “Append file path column” so that the file path is included. This will give you clues about which file has a different table specification to the others.
Actulay it’s because of corrupted file not remaining excel files below is screen shot for your reference
when i use variable loop end i didn’t face this issue
i think loop end gathering data even from corrupt file
Thank you so much I have too many doubts (trying to learn Knime in deep)
I Appreciate for your patience.
Ah ok, thanks for the additional information. I think I can see what the issue would be, that I had overlooked for the Loop End node. When it finds a corrupt file, it doesn’t read it but will use the output from the Empty Table Creator node. This will probably be the cause of the problem, because that Empty Table is not the correct format!
The simplest thing to fix that (I hope!) would be to create an Excel file which is the correct format, in terms of column names but which has no data rows, then replace the Empty Table Creator with an Excel Reader node that reads that file and passes the empty table to the lower port of the Catch Errors (Data Ports) node.
As an addendum to this, which I thought I should correct here for future reference, I updated the original workflow on the hub to remove a further mistake. The output from Files should not have been passed back to the loop. This was the additional cause of the changing table specification problem when a corrupted files was moved.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.