This workflow uses a portion of the Irish Energy Meter dataset, and presents a simple analysis based on the whitepaper "Big Data, Smart Energy, and Predictive Analytics". It is intended to highlight KNIME's Big Data and Spark functionality. The workflow creates a Local Big Data Environment, loads the meter dataset to Hive, and then transfers it into Spark. It uses a series of Spark SQL nodes to create datetime fields, and then uses Spark nodes to aggregate energy usage over these datetime fields. In the wrapped metanode, it performs PCA and k-means using Spark nodes, and does some simple visualizations of the clustered data. Finally, it writes the clustered data out to both Hive and Parquet formats.
This is a companion discussion topic for the original entry at https://kni.me/w/zgLqAhI5845NedFe