I’m trying to configure the JDBC connection preferences (using Cloudera) and Hive connection node to connect to one or more Hive servers that are managed through zookeeper. I’ve tried a number of variants to try to get the syntax figured out, but not quite getting it right.
Can you confirm a connection can be made to a zookeeper managed Hive server?
Do I need to include the serviceDiscoveryMode=zookeeper and zooKeeperNamespace=HiveServer2 parameters in the driver settings URL, add them to the parameters tab in the connection node, or reference them in both places?
What’s the proper syntax for adding zookeeper references to the URL, if they are included in the URL?
For multiple servers that are managed by zookeeper, what is the URL and node connection string syntax to list them in order to use? IE: Are they the full URL for each separated by a coma or how do you string them together?
that depends. Are you using the proprietary JDBC driver from Cloudera (which needs to be downloaded from their website), or the embedded open-source driver that KNIME ships with?
Indeed you need to add the parameters serviceDiscoveryMode=zooKeeper and zooKeeperNamespace=XYZ. to the JDBC parameters in the JDBC Connector node dialog. The concrete zooKeeperNamespace depends on your cluster setup. Also, you need to specify the host and port of a zookeeper node in your cluster as host and port in the Hive Connector node dialog.
The proprietary JDBC driver from cloudera needs different settings. You can infer those from the JDBC URL format in the driver documentation provided by Cloudera.
Thanks for the response Björn. I’ve been doing some more research and testing as I am using Cloudera. I’m finding that if I add serviceDiscoveryMode=zooKeeper I get a java null pointer exception error. It does not seem to like that line. If I remove it, it states that I am missing the required port, which I have specified. Suggestions? I’m going to try switching to the embedded driver.
Hello benpope,
please follow the instructions here to register the proprietary Cloudera Hive driver. Once the driver is registered use the Hive Connector node and specify the additional connection parameters via the JDBC Parameters tab as explained here which would look like this: