In this use case, we will use the NYC taxi dataset and a Random Forest to train a simple time series prediction model to predict taxi demand in the next hour based on data from past hours. For better scalability, we will train and test the model on a Spark cluster.

