CSV to Spark with Amazon S3 Connection

Elderion · May 10, 2018, 11:34am

Hi,
I have troubles with Amazon S3 and CSV to Spark nodes:
Error during fetching data from Spark, reason: org.knime.bigdata.spark.core.exception.KNIMESparkException: Failed to read input path with name ‘my-bucket-name/set.csv’. Reason: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties (respectively).

My problem is: I don’t know where I have to specify those properties. I have checked:

Specify Access Key ID and Secret Access Key in Amazon S3 Connection node (Access Key Id and Secret Key option)
Specify Access Key ID and Secret Access Key in “credential” file in my .aws local folder and set “Default Credential Provider Chain” option in Amazon S3 Connection node.
Set “Default Credential Provider Chain” without any credentials (without any credential file in my .aws folder)
Set “fs.s3.awsAccessKeyId” and “fs.s3n.awsSecretAccessKey” in “Create Spark Context” node

without any success. It looks like missing Core-site.xml file (or something like that). I’m able to upload, download, list etc. with “Download”, “Upload” nodes connected to Amazon S3 Connection node.

Thanks for your help.

bjoern.lohrmann · May 11, 2018, 9:34am

Hi @Elderion
The credentials need to be put into the core-site.xml that is being used inside your Spark cluster. If you are on cloudera/Hortonworks, the core-site.xml of your cluster is managed from Cloudera Manager/Ambari.

Björn

Elderion · July 26, 2018, 7:20pm

Hi @bjoern.lohrmann
That works, thank you for help!

system · June 2, 2023, 9:01pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.