Local big data environment - Spark to Parquet (to S3), AWS Credentials Chain

#1

Hi Folks,
I have a local big data environment running a spark job, I want to use the spark to parquet node which is connected to S3. My S3 connection is setup and working via the Default credential provider chain, but I need some way of supplying the default credential provider chain to the local spark environment.

Is there a way to do this with the local big data environment, e.g. access to the core-site.xml. And what is the options to set to pickup the credentials chain?

0 Likes

#2

Hi @DXRX,
Unfortunately it is currently not possible to use the S3 connection with the Spark Data nodes in the local Big Data environment. We already have an open Ticket for this.

The workaround is to write the data to your local file system with the “Spark to Parquet” node and the local HDFS connection of the “Local Big Data Environment” node. Afterwards you can upload it to S3 with the “Upload” node.

best regards Mareike

2 Likes

#3

Ok thanks for the information, will make the necessary changes.

0 Likes