Hey all,
I want to list files and folders of a remote zip archive without decompressing it.
I did this for local zip archives with the Python Source node and following code:
import zipfile
import pandas as pd
measurements = []
with zipfile.ZipFile(flow_variables['location'], 'r') as f:
for entry in (f.namelist()):
if entry.endswith(".d/"):
measurements.append(entry)
output_table = pd.DataFrame(measurements, columns=["measurements"])
Do you have any idea, how to do this? I don’t want and also can’t decompress the archives in order to get the list of files and folders. One file has around 600 GB.
I have tried List Files/Folders node with local zip and files are listed as expected. This node also features File System Connection port so hopefully it should also work on a remote zip. Have you tried it?
Hello @ipazin,
thanks for your answer. I’m aware of the List Files/Folders node and I’m using it already to show me all the files in a respective folder on a remote location. Also, I already looked for an option to list files from a zip archive as well, but I didn’t found it. After your suggestion, I tried it with a local zip file, but it didn’t work either. May be I have overlooked something?
Hello everyone,
as Ivan already mentioned we are looking into implementing a File Archive Connector node. It’s still in the planning phase so any feedback from you is highly appreciated.
Here is what is planned so far:
The node would have an optional file system connection and the user would select in the node dialog a file archive (zip, jar, tar, …). The output of the node would be a file system that represents this archive. So you could use the file utility nodes to work with the archive. For example, you could use the List Files/Folders node to list the entries of the archive. If supported by the archive you would be also able to read and write single entries from the archive.
What do you think?
Bye
Tobias
Hello @tobias.koetter,
that sounds exactly like I had imagined it! To list files and subdirectories from the archive would be enough for my particular use case, but your thoughts so far are definitely more sustainable than this!
I will think about this feature later on. Maybe I’ve something to add.
Thanks for your responses and answers,
Best,
Johann