Access AWS EMR hive connection using HIVE connector Node in KNIME

Is there a feasibility to access AWS EMR hive using  KNIME. If yes then what will be parameter configuration like port, KNIME node etc for this.

We are able to access hive through putty (ssh ) and through hue from my local machine.

Hi,

you can use the Hive Connector node to access Hive from KNIME.

However you need access to port TCP/10000 on the AWS instance where Hiveserver2 is running (that's where the KNIME driver connects to). This can either be achieved by allowing direct access in the AWS security group of the AWS instance, or by SSH tunneling via putty.

The third options is to create a Windows instance in AWS and to put it into the same network as your EMR cluster. You can then access the Windows instance via remote desktop from your local machine, install KNIME with the big data extensions there and then use Hive Connector.

Best,

Björn

Hi,

I am able to connect via AWS EC2 instance. But not able to connect from my local system. 

Option tried :- I had given all traffic permission in security group and created tunneling via putty for port 10000.

Node "Database Connector" runs to green bur node  "Database Table Selector" is not working giving issue :- "Could not get table types from database, reason: Connection reset" on clicking Fetch Metadata and on running :- "Execute failed: Error while validating SQL query: Connection reset by peer: socket write error"

Will you please help on what I am missing.

Thanks

Hi,

have you registered the Amazon provided Hive driver? If not please download the driver from https://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/HiveJDBCDriver.html
Extract the zip file. Open the KNIME preferences (File → Preferences), select KNIME → Databases on the left and click Add directory. Then select the extracted driver directory. The driver should be detected by KNIME and preferred over the built-in driver. This can be verified by opening a Hive Connector Node dialog, which should display the name of the Amazon driver in the driver section. Once the driver is registered follow the steps in the Hive JDBC Driver page e.g. setting up the ssh tunnel.

Bye

Tobias

 

Thanks for suggestion. It works :)