Download files from Google Storage with Python script

Hi,

I have been trying to download files from a client’s Google cloud storage for a while but it keeps giving me a permission error. Using the same credentials I can down files with a python script. Doing 1 file at a time is not viable when there are over 200 files and there will be 200 more every week.

Is it possible to use the same python script (copied directly from the Google documentation and included below), or a similar one, in KNIME to loop through the list of files and download them to a local destination folder?

import os
from google.cloud import storage
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = r"D:\Users\<google_service_account.json>"

def download_blob(bucket_name, source_blob_name, destination_file_name):
    """Downloads a blob from the bucket."""
    # The ID of your GCS bucket
    # bucket_name = "your-bucket-name"

    # The ID of your GCS object
    # source_blob_name = "storage-object-name"

    # The path to which the file should be downloaded
    # destination_file_name = "local/path/to/file"

    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)

    # Construct a client side representation of a blob.
    # Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
    # any content from Google Cloud Storage. As we don't need additional data,
    # using `Bucket.blob` is preferred here.
    blob = bucket.blob(source_blob_name)
    blob.download_to_filename(destination_file_name)

download_blob("project16","sensorlog/log20220301","d:/sensorlog/log20220301.csv")

I have a CSV file with columns that list the source blob, bucket, and destination file name.

source_blob_name,bucket_name,destination_file_name
project136,sensorlog/log20220301,d:/sensorlog/log20220301.csv
project136,sensorlog/log20220302,d:/sensorlog/log20220302.csv
project136,sensorlog/log20220303,d:/sensorlog/log20220303.csv
project136,sensorlog/log20220304,d:/sensorlog/log20220304.csv
project136,sensorlog/log20220305,d:/sensorlog/log20220305.csv

I would appreciate any suggestions.

Thanks,

tC/.

Hi @TigerCole -

I am probably missing some nuance to your task here, but what about using the dedicated Google nodes? I just set up a little example to move five CSVs from a shared test drive in Google to my local machine:

2022-03-16 10_55_08-KNIME Analytics Platform

No Python scripting required… but I realize it’s likely I’m over simplifying. Have you tried these nodes before?

1 Like

Hi @ScottF

Actually, it is just about that simple … ideally the flow should look something like this (the loop to fetch all the files might not be required?)

gcp_fetch

But, it isn’t. For some reason, the Google Cloud Storage Connector always fails with error:

“…does not have storage.buckets.list access to the Google Cloud project”

I am not sure why because I can connect and list the files because I can do it in a Jupyter Notebook. I don’t write code, but I managed to list and fetch files (1 at a time) with copy 'n paste code from the Google documentation but I want to do it Knime which is my primary workspace.

I thought it may be something I missed in the Google Authentication Node but I am sure that I have all the necessary Scopes included

GCPNode1

My options are to sort out why the storage connector is consistently erroring, or loop through the files with a python script.

tC/.

Not sue if I am too naive, but maybe you need these access rights?

Hi @mlauber71

I can connect to the storage, list the folders and the files, and download them (even if it is one at a time) in a Jupyter notebook, so I assume that I have the necessary rights.

I am sure that it is something quite simple, but I am sure that I have the rights.

tC/.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.