Connecting to Amazon EMR

This workflow demonstrates how to create a Spark context via Apache Livy and execute a simple Spark job on an Amazon EMR cluster. This example uses the NYC taxi dataset from the AWS Registry of Open Data to build a simple prediction model with Random Forest. Additionally, this workflow also shows how to configure Amazon Athena to query dataset that is located on an Amazon S3 bucket.


This is a companion discussion topic for the original entry at https://kni.me/w/KnWOA4pZPuWmTo40