I have about 600 zip files.
Each zip file contains an another zip file.
I would like to extract one file from the second level zip.
I solved it with the unzip node and loop function, etc but it is very slow, about 20 sec / file, because it unpacks the whole zip., about 1000 files / zip.
Alternatively, you could do the unzip outside of Knime. You can check this thread:
Although the topic was about 7-Zip, similar approach can be done. You can try to unzip specific files via the command line. It will be much faster. It will take less than 20 sec to do ALL of them.
Once you extracted your 123.txt files, you can then go back to Knime and process the txt files.
Thank you very much for the answers.
The “R Script” has become the right solution for selecting files and reading two-level zips.
I take the list of files to be unpacked from a table with a loop.
This is works very vell.
However, there is a minor problem: my Zip files do not always contain all the files I give on the input. In this case, all the files in the ZIP will be extracted. The solution I think would be a “file exists” test, I tried to find a solution with the “TRY (Variable ports)” node, I don’t know if this is a good approach?
Thanks for the reply, I managed to try it. What I faced with the problem in this case was that the cycle ran very slowly. The data stream is managed from a network drive that probably caused the problem. The solution worked anyway.
Thanks for sharing a snapshot of the solution. It clearly shows the algorithm to follow.
Would it be possible to share the workflow too, just with the portion you have snapshotted? I believe other people in the forum would appreciate to have access to the R code solution too. Thanks in advance.
A condesed version of the R code (how to extract only certain files) can be found here. Maybe @palatisa can upload a version of his solution with dummy data as well.
Along with maybe other useful R and ZIP solutions, like keeping the original timestamps intact
@palatisa I will have a look at it. The structure doe slook good. You could do two more things: give ist a title besides the echnical one with the _. That would look better if someone finds it. And maybe insert a link to this thread so people could see the context. You can edit the description of the workflow.