How to connect to server on the Energy_Prepare_Data (Big Data) workflow

I was trying to understand how to execute the Energy_Prepare_Data (Big Data) workflow, but I cannot successfully run the 3 big data connector nodes in the image below:

I have already added in the directory in Preferences > Database the JDBC driver for Impala that is provided by Cloudera. But I encounter this error:

ERROR Impala Connector 0:120 Execute failed: Could not create connection to database: connect timed out

The same behaviour comes out if I try to connect to Hive Connector node or ParStream platform.

From the current example the connector nodes are configured with
ec2-54-172-142-19.compute-1.amazonaws.com as hostname, and default as database name.

Can someone explain me how to proceed for connect, retrieve the data and test this workflow?

Thanks in advance.
~G

Hi,
these are only example connection information and will not work. If you want to run this workflow you will need to connect to your own cluster and upload the data to it.
Sorry about the inconveniences
Tobias

Hi Tobias,
I have imagined that probably there wouldn’t be the possibility to manage directly the data within the workflow with my own cluster, but I wasn’t sure.

Thanks anyway for the quick reply.

Hi,
with the next KNIME release we will have an extension that automatically sets up a local big data environment with Spark, Hive and HDFS. With this you will be able to execute the workflow at least on a subset of the data also locally. We have demonstrated this feature last week at the KNIME Summit. So if you want to know more about it have a look at the What’s new and cooking talk on Wednesday once the slides are published.
Bye
Tobias

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.