extract specific file from zip

I have about 600 zip files.
Each zip file contains an another zip file.
I would like to extract one file from the second level zip.

I solved it with the unzip node and loop function, etc but it is very slow, about 20 sec / file, because it unpacks the whole zip., about 1000 files / zip.

for example:

1.zip->a.zip->123.txt
2.zip->a.zip->123.txt

result folder and file:

…\1\123.txt
…\2\123.txt

Thanks for helping.

Hi @palatisa,

you could try it with a java snippet

e.g.:

I have an example folder with the following structure:
grafik

Then the following javascript would work:

 Path zipFile = Paths.get("/pathToZip/Neuer Ordner.zip");
 String fileName = "/testfolder2/fileToGet.xlsx";
 Path outputFile = Paths.get("outputFolder\\NewFolder\\fileToGet.xlsx");


   try (FileSystem fileSystem = FileSystems.newFileSystem(zipFile, null)) {
    Path fileToExtract = fileSystem.getPath(fileName);
    Files.copy(fileToExtract, outputFile);
    
} catch (Exception e)
{
}

*java code copied from:

in your case most likely with two steps - get the first zip → then from this zip perform the same function again

3 Likes

Hi @palatisa and welcome to the Knime Community.

Alternatively, you could do the unzip outside of Knime. You can check this thread:

Although the topic was about 7-Zip, similar approach can be done. You can try to unzip specific files via the command line. It will be much faster. It will take less than 20 sec to do ALL of them.

Once you extracted your 123.txt files, you can then go back to Knime and process the txt files.

3 Likes

@palatisa you could try and adapt this approach. Listing files from a ZIP file and then just extracting specific ones (like CSV):

3 Likes

Thank you very much for the answers.
The “R Script” has become the right solution for selecting files and reading two-level zips.
I take the list of files to be unpacked from a table with a loop.
This is works very vell.

However, there is a minor problem: my Zip files do not always contain all the files I give on the input. In this case, all the files in the ZIP will be extracted. The solution I think would be a “file exists” test, I tried to find a solution with the “TRY (Variable ports)” node, I don’t know if this is a good approach?

1 Like

@palatisa you could indeed try and skip missing files:

4 Likes

Thanks again for the answers, the solution I developed may not be elegant, but it works.

  1. The first “R source” node unpacks the english.zip file from all 600 zip files.
  2. The second “R source” extracts the names of the files in english.zip.
  3. I joined the extracted list with the list of “must” files. The table compiled in this way contains the files to be extracted.
  4. The third “R source” performs the unpacking with a loop.

4 Likes

Thanks for the reply, I managed to try it. What I faced with the problem in this case was that the cycle ran very slowly. The data stream is managed from a network drive that probably caused the problem. The solution worked anyway.

Hi @palatisa

Thanks for sharing a snapshot of the solution. It clearly shows the algorithm to follow.

Would it be possible to share the workflow too, just with the portion you have snapshotted? I believe other people in the forum would appreciate to have access to the R code solution too. Thanks in advance.

Best

Ael

2 Likes

A condesed version of the R code (how to extract only certain files) can be found here. Maybe @palatisa can upload a version of his solution with dummy data as well.

Along with maybe other useful R and ZIP solutions, like keeping the original timestamps intact :slight_smile:

3 Likes

This is my first worflow. It is likely to be simplified in some sub-solutions. If it can be simplified, please mark it, I’ll fix it.

unzip_specific_files_based_on_list_from_zip_in_zip.knwf (74.5 KB)

unzip_specific_files_based_on_list_from_zip_in_zip – KNIME Hub

4 Likes

@palatisa I will have a look at it. The structure doe slook good. You could do two more things: give ist a title besides the echnical one with the _. That would look better if someone finds it. And maybe insert a link to this thread so people could see the context. You can edit the description of the workflow.

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.