Hive/DB Connector - Hive HA connection with multiple zookeepers

I want to connect to an Apache Hive HA configuration using either a Hive Connector node or a DB Connector node.

How can I make the connection, any help would be appreciated.

Here’s how I set it up. (DB Connector)


I created a Database URL using the DB Connector node and added the serviceDiscoveryMode and zookeeperNamespace options to the JDBC Parameters.

The error message displayed in the KNIME console is shown below.

ERROR DB Connector 3:84 Execute failed: ‘com.google.common.util.concurrent.ListeningExecutorService com.google.common.util.concurrent.MoreExecutors.sameThreadExecutor()’

If you additionally download the Cloudera Hive Driver Jar(ClouderaHiveJDBC41-2.6.21.1025 and ClouderaHiveJDBC42-2.6.21.1025), register it in Preferences, and test it,



The error message displayed in the KNIME console is shown below.

ERROR DB Connector 3:86 Execute failed: Selected JDBC driver does not support given JDBC url.

Thanks.

Hi @mjkim,

Welcome to the KNIME community!

Is there more in the KNIME console? The error message is very short. Not sure if it is important, but may be double-checking the up case K in zooKeeper and zooKeeperNamespace.

The open-source Hive driver and the Cloudera Hive driver are different things, and use different parameter names and JDBC URL parts. Looks like you have to use something like jdbc:hive2://zk=... (see the Cloudera documentation here).

Cheers,
Sascha

Hi, Sascha

Thank you for your reply.

I tested this by adding the Cloudera Hive JDBC driver to KNIME Preference - KNIME - Databases.

<KNIME Preference - KNIME - Databases>

<DB Connector - configuration>

The error in the KNIME console is shown below.

DEBUG DB Connector 3:86 DB Connector 3:86 Start execute
DEBUG DB Connector 3:86 New database session: DefaultDBSessionInformation(id=2ca46bfc-54d3-49fa-9e3b-ef6608a7d78f, dbType=DBType(id=hive, name=Hive, description=Hive), driverDefinition=DBDriverDefinition(id=cloudera_hive, name=cloudera_hive, version=2.6.0, driverClass=com.cloudera.hive.jdbc41.HS2Driver, dbType=DBType(id=hive, name=Hive, description=Hive), description=, origin=USER), connectionController=org.knime.database.connection.UserDBConnectionController=(url=jdbc:hive2://zk=server01:2181,server02:2181,server02:2181/hiveserver2, authenticationType=USER, user=hdfs), dialectId=hive, attributeValues={})
INFO DB Connector 3:86 Could not create connection to database using URL: jdbc:hive2://zk=server01:2181,server02:2181,server02:2181/hiveserver2 and parameters: [password, user]. Exception: [Cloudera]HiveJDBCDriver Not able to connect to any Thrift servers using ZOOKEEPER service.
DEBUG DB Connector 3:86 reset
ERROR DB Connector 3:86 Execute failed: [Cloudera]HiveJDBCDriver Not able to connect to any Thrift servers using ZOOKEEPER service.
DEBUG DB Connector 3:86 DB Connector 3:86 doBeforePostExecution
DEBUG DB Connector 3:86 DB Connector 3:86 has new state: POSTEXECUTE
DEBUG DB Connector 3:86 DB Connector 3:86 doAfterExecute - failure
DEBUG DB Connector 3:86 reset
DEBUG DB Connector 3:86 clean output ports.
DEBUG DB Connector 3:86 Removing handler 880e46d3-97e2-4595-a808-87d0984d162c (DB Connector 3:86: ) - 0 remaining
DEBUG DB Connector 3:86 DB Connector 3:86 has new state: IDLE
DEBUG DB Connector 3:86 Configure succeeded. (DB Connector)
DEBUG DB Connector 3:86 DB Connector 3:86 has new state: CONFIGURED
DEBUG DB Connector 3:86 Using_Spark_File_to_HIVE_Kyuubi 3 has new state: IDLE
DEBUG NodeTimer$GlobalNodeStats Successfully wrote node usage stats to file: /Users/minjee/Desktop/knime-workspace/.metadata/knime/nodeusage_3.0.json

When I test it on beeline in hive cluster, it runs fine.

Thanks.

Hi @mjkim,

Can you double-check that the lookup of server01 works on your Workstation? It sounds more like a cluster internal name. Then double check that Zookeeper and Hive are reachable from outside the cluster?

Most of the time, there is a service at the edge of your cluster that works like a bridge (e.g. Apache KNOX), and you should connect to this, instead of using the cluster internal zookeeper instances. The cluster internal services might not be reachable from outside the cluster.

Cheers,
Sascha

Hi Sascha,

The server01, server02, server03 I’m using for the JDBC URL are the hostnames of my test server, a Linux server.

It works fine when I test it with DBeaver Tools. It must be connecting from outside the cluster.

When I test in KNIME with the same driver (Add Driver to KNIME AP Preference - KNIME - Databases), I get the following message in the KNIME console.

<KNIME Preference - KNIME - Database>

<DB Connector - configuration>

DEBUG ExecuteAction Creating execution job for 1 node(s)…
DEBUG NodeContainer DB Connector 3:84 has new state: CONFIGURED_MARKEDFOREXEC
DEBUG NodeContainer DB Connector 3:84 has new state: CONFIGURED_QUEUED
DEBUG NodeContainer Using_Spark_File_to_HIVE_Kyuubi 3 has new state: EXECUTING
DEBUG DB Connector 3:84 DB Connector 3:84 doBeforePreExecution
DEBUG DB Connector 3:84 DB Connector 3:84 has new state: PREEXECUTE
DEBUG DB Connector 3:84 Adding handler 580ce38e-0fca-455e-bad8-df4e53a513b1 (DB Connector 3:84: ) - 1 in total
DEBUG DB Connector 3:84 DB Connector 3:84 doBeforeExecution
DEBUG DB Connector 3:84 DB Connector 3:84 has new state: EXECUTING
DEBUG DB Connector 3:84 DB Connector 3:84 Start execute
DEBUG DB Connector 3:84 New database session: DefaultDBSessionInformation(id=aa026643-f75b-4ed9-ab16-a7af0d2cc06d, dbType=DBType(id=hive, name=Hive, description=Hive), driverDefinition=DBDriverDefinition(id=hivejdbc265, name=hivejdbc265, version=-1.-1.0, driverClass=org.apache.hive.jdbc.HiveDriver, dbType=DBType(id=hive, name=Hive, description=Hive), description=, origin=USER), connectionController=org.knime.database.connection.UserDBConnectionController=(url=jdbc:hive2://server01:2181,server02:2181,server03:2181/default;serviceDiscoveryMode=zookeeper;zookeeperNamespace=hiveserver2, authenticationType=USER_PWD, user=hive, password=true), dialectId=hive, attributeValues={})
DEBUG DB Connector 3:84 Acquiring connection.
DEBUG DB Connector 3:84 An error of unexpected type occurred while checking if an object is a wrapper for org.knime.database.connection.impl.managing.transaction.BroadTransactionManagingWrapper.
DEBUG DB Connector 3:84 An error of unexpected type occurred while checking if an object is a wrapper for org.knime.database.connection.impl.managing.transaction.TransactionManagingWrapper.
WARN DB Connector 3:84 Couldn’t fetch SQL keywords from database.
DEBUG DB Connector 3:84 The connection has been relinquished.
DEBUG DB Connector 3:84 The managed connection has been closed.
DEBUG DB Connector 3:84 The transaction managing connection has been closed.
DEBUG DB Connector 3:84 Database session information:DefaultDBSessionInformation(id=aa026643-f75b-4ed9-ab16-a7af0d2cc06d, dbType=DBType(id=hive, name=Hive, description=Hive), driverDefinition=DBDriverDefinition(id=hivejdbc265, name=hivejdbc265, version=-1.-1.0, driverClass=org.apache.hive.jdbc.HiveDriver, dbType=DBType(id=hive, name=Hive, description=Hive), description=, origin=USER), connectionController=org.knime.database.connection.UserDBConnectionController=(url=jdbc:hive2://server01:2181,server02:2181,server03:2181/default;serviceDiscoveryMode=zookeeper;zookeeperNamespace=hiveserver2, authenticationType=USER_PWD, user=hive, password=true), dialectId=hive, attributeValues={})
DEBUG DB Connector 3:84 Attribute details of database session aa026643-f75b-4ed9-ab16-a7af0d2cc06d:
AttributeValueRepository(attributes={knime.db.connection.jdbc.fetch_size=knime.db.connection.jdbc.fetch_size:10000, knime.db.connection.parameter.append.separator=knime.db.connection.parameter.append.separator:&, knime.db.capability.multi.dbs=knime.db.capability.multi.dbs:false, knime.db.dialect.sql.delimiter.identifier.opening=knime.db.dialect.sql.delimiter.identifier.opening:, knime.db.dialect.sql.table_reference.keyword=knime.db.dialect.sql.table_reference.keyword:, knime.db.dialect.sql.capability.operation.minus=knime.db.dialect.sql.capability.operation.minus:false, knime.db.dialect.sql.insert_into_table_with_select=knime.db.dialect.sql.insert_into_table_with_select:true, knime.db.agent.writer.autocommit_if_failing_on_error=knime.db.agent.writer.autocommit_if_failing_on_error:false, knime.db.dialect.sql.table_reference.derived_table=knime.db.dialect.sql.table_reference.derived_table:true, knime.db.connection.parameter.append=knime.db.connection.parameter.append:false, knime.db.connection.parameter.append.last.suffix=knime.db.connection.parameter.append.last.suffix:, knime.db.connection.jdbc.properties=knime.db.connection.jdbc.properties:#Mon Jul 17 10:55:01 KST 2023 , knime.db.connection.statement.cancellation.polling_period=knime.db.connection.statement.cancellation.polling_period:1000, knime.db.dialect.sql.capability.random=knime.db.dialect.sql.capability.random:true, knime.db.connection.metadata.table.types=knime.db.connection.metadata.table.types:TABLE, VIEW, knime.db.agent.writer.batch.enabled=knime.db.agent.writer.batch.enabled:true, knime.db.connection.statement.cancellation.enabled=knime.db.connection.statement.cancellation.enabled:true, knime.db.connection.metadata.configure.timeout=knime.db.connection.metadata.configure.timeout:3, knime.db.connection.metadata.configure.enabled=knime.db.connection.metadata.configure.enabled:true, knime.db.connection.metadata.query.flatten=knime.db.connection.metadata.query.flatten:false, knime.db.dialect.sql.create_table.define_constraint_name=knime.db.dialect.sql.create_table.define_constraint_name:false, knime.db.dialect.sql.identifier.delimiting.onlyspaces=knime.db.dialect.sql.identifier.delimiting.onlyspaces:false, knime.db.connection.time_zone=knime.db.connection.time_zone:UTC, knime.db.dialect.sql.capability.random_seed=knime.db.dialect.sql.capability.random_seed:true, knime.db.connection.reconnect.timeout=knime.db.connection.reconnect.timeout:0, knime.db.dialect.sql.identifier.replace.character.non_word.enabled=knime.db.dialect.sql.identifier.replace.character.non_word.enabled:false, knime.db.dialect.sql.create.table.temporary=knime.db.dialect.sql.create.table.temporary:TEMPORARY, knime.db.dialect.sql.minus_operator.keyword=knime.db.dialect.sql.minus_operator.keyword:, knime.db.dialect.sql.delimiter.identifier.closing=knime.db.dialect.sql.delimiter.identifier.closing:, knime.db.connection.logger.log.errors=knime.db.connection.logger.log.errors:false, knime.db.dialect.sql.capability.expression.case=knime.db.dialect.sql.capability.expression.case:true, knime.db.dialect.sql.identifier.replace.character.non_word.replacement=knime.db.dialect.sql.identifier.replace.character.non_word.replacement:, knime.db.connection.kerberos_delegation.service=knime.db.connection.kerberos_delegation.service:, knime.db.connection.reconnect=knime.db.connection.reconnect:false, knime.db.connection.transaction.enabled=knime.db.connection.transaction.enabled:false, knime.db.connection.parameter.append.initial.separator=knime.db.connection.parameter.append.initial.separator:?, knime.db.dialect.sql.create_table.if_not_exists=knime.db.dialect.sql.create_table.if_not_exists:IF NOT EXISTS, knime.db.dialect.sql.drop_table=knime.db.dialect.sql.drop_table:true, knime.db.connection.validation_query=knime.db.connection.validation_query:, knime.db.connection.kerberos_delegation.host_regex=knime.db.connection.kerberos_delegation.host_regex:.(?:@|//)([^:;,/\]).*, knime.db.connection.parameter.append.user_and_password=knime.db.connection.parameter.append.user_and_password:false, knime.db.connection.logger.enabled=knime.db.connection.logger.enabled:false, knime.db.connection.restore=knime.db.connection.restore:false, knime.db.agent.writer.fail_on_missing_in_where_clause=knime.db.agent.writer.fail_on_missing_in_where_clause:true, knime.db.connection.init_statement=knime.db.connection.init_statement:},
none default values={})
DEBUG DB Connector 3:84 Acquiring connection.
DEBUG DB Connector 3:84 The connection has been relinquished.
DEBUG DB Connector 3:84 The managed connection has been closed.
DEBUG DB Connector 3:84 The transaction managing connection has been closed.
DEBUG DB Connector 3:84 reset
DEBUG DB Connector 3:84 Closing the connection manager of the session: DefaultDBSessionInformation(id=aa026643-f75b-4ed9-ab16-a7af0d2cc06d, dbType=DBType(id=hive, name=Hive, description=Hive), driverDefinition=DBDriverDefinition(id=hivejdbc265, name=hivejdbc265, version=-1.-1.0, driverClass=org.apache.hive.jdbc.HiveDriver, dbType=DBType(id=hive, name=Hive, description=Hive), description=, origin=USER), connectionController=org.knime.database.connection.UserDBConnectionController=(url=jdbc:hive2://server01:2181,server02:2181,server03:2181/default;serviceDiscoveryMode=zookeeper;zookeeperNamespace=hiveserver2, authenticationType=USER_PWD, user=hive, password=true), dialectId=hive, attributeValues={})
ERROR DB Connector 3:84 Execute failed: DB Session aa026643-f75b-4ed9-ab16-a7af0d2cc06d is invalid. Method not supported
DEBUG DBConnectionManager Closing the database connection: URL=“jdbc:hive2://server01:2181,server02:2181,server03:2181/default;serviceDiscoveryMode=zookeeper;zookeeperNamespace=hiveserver2”, user=“hive”
DEBUG DB Connector 3:84 DB Connector 3:84 doBeforePostExecution
DEBUG DB Connector 3:84 DB Connector 3:84 has new state: POSTEXECUTE
DEBUG DB Connector 3:84 DB Connector 3:84 doAfterExecute - failure
DEBUG DB Connector 3:84 reset
DEBUG DB Connector 3:84 clean output ports.
DEBUG DB Connector 3:84 Removing handler 580ce38e-0fca-455e-bad8-df4e53a513b1 (DB Connector 3:84: ) - 0 remaining
DEBUG DB Connector 3:84 DB Connector 3:84 has new state: IDLE
DEBUG DB Connector 3:84 Configure succeeded. (DB Connector)
DEBUG DB Connector 3:84 DB Connector 3:84 has new state: CONFIGURED
DEBUG DB Connector 3:84 Using_Spark_File_to_HIVE_Kyuubi 3 has new state: IDLE
DEBUG DBConnectionManager The database connection has been closed successfully: URL=“jdbc:hive2://server01:2181,server02:2181,server03:2181/default;serviceDiscoveryMode=zookeeper;zookeeperNamespace=hiveserver2”, user=“hive”
DEBUG NodeContainerEditPart DB Connector 3:84 (CONFIGURED)

Are there any success stories for Hive HA connections with multiple zookeepers?
Please confirm.

Thanks.

Hi @mjkim,

Can you try the Hive Connector instead of the generic DB connector? It contains some Hive specific settings that might help.

Cheers,
Sascha

Hi @sascha.wolke ,

The Hive Connector has a specific format for Hostname, Port, and Database name, so I’m not sure if this is the right way to test it.

I tested it by setting it up like below.
<Hive Connector - configuration>

DEBUG ExecuteAction Creating execution job for 1 node(s)…
DEBUG NodeContainer Hive Connector 3:85 has new state: CONFIGURED_MARKEDFOREXEC
DEBUG NodeContainer Hive Connector 3:85 has new state: CONFIGURED_QUEUED
DEBUG NodeContainer Using_Spark_File_to_HIVE_Kyuubi 3 has new state: EXECUTING
DEBUG Hive Connector 3:85 Hive Connector 3:85 doBeforePreExecution
DEBUG Hive Connector 3:85 Hive Connector 3:85 has new state: PREEXECUTE
DEBUG Hive Connector 3:85 Adding handler bccd45f3-e889-4a3e-b526-2beb3b2b71fd (Hive Connector 3:85: ) - 3 in total
DEBUG Hive Connector 3:85 Hive Connector 3:85 doBeforeExecution
DEBUG Hive Connector 3:85 Hive Connector 3:85 has new state: EXECUTING
DEBUG Hive Connector 3:85 Hive Connector 3:85 Start execute
DEBUG Hive Connector 3:85 New database session: DefaultDBSessionInformation(id=d9a0cc88-9f44-4713-b3d9-01a3268bbc8e, dbType=DBType(id=hive, name=Hive, description=Hive), driverDefinition=DBDriverDefinition(id=cloudera_hive, name=cloudera_hive, version=2.6.0, driverClass=com.cloudera.hive.jdbc41.HS2Driver, dbType=DBType(id=hive, name=Hive, description=Hive), description=, origin=USER), connectionController=org.knime.database.connection.UserDBConnectionController=(url=jdbc:hive2://zk=server01:2181,server02:2181,server03:2181/hiveserver2:2181/default, authenticationType=USER_PWD, user=hive, password=true), dialectId=hive, attributeValues={})
INFO Hive Connector 3:85 Could not create connection to database using URL: jdbc:hive2://zk=server01:2181,server02:2181,server03:2181/hiveserver2:2181/default and parameters: [password, user]. Exception: [Cloudera]HiveJDBCDriver Not able to connect to any Thrift servers using ZOOKEEPER service.
DEBUG Hive Connector 3:85 reset
ERROR Hive Connector 3:85 Execute failed: [Cloudera]HiveJDBCDriver Not able to connect to any Thrift servers using ZOOKEEPER service.
DEBUG Hive Connector 3:85 Hive Connector 3:85 doBeforePostExecution
DEBUG Hive Connector 3:85 Hive Connector 3:85 has new state: POSTEXECUTE
DEBUG Hive Connector 3:85 Hive Connector 3:85 doAfterExecute - failure
DEBUG Hive Connector 3:85 reset
DEBUG Hive Connector 3:85 clean output ports.
DEBUG Hive Connector 3:85 Removing handler bccd45f3-e889-4a3e-b526-2beb3b2b71fd (Hive Connector 3:85: ) - 2 remaining
DEBUG Hive Connector 3:85 Hive Connector 3:85 has new state: IDLE
DEBUG Hive Connector 3:85 Configure succeeded. (Hive Connector)
DEBUG Hive Connector 3:85 Hive Connector 3:85 has new state: CONFIGURED
DEBUG Hive Connector 3:85 Using_Spark_File_to_HIVE_Kyuubi 3 has new state: IDLE

DEBUG ExecuteAction Creating execution job for 1 node(s)…
DEBUG NodeContainer Hive Connector 3:92 has new state: CONFIGURED_MARKEDFOREXEC
DEBUG NodeContainer Hive Connector 3:92 has new state: CONFIGURED_QUEUED
DEBUG NodeContainer Using_Spark_File_to_HIVE_Kyuubi 3 has new state: EXECUTING
DEBUG Hive Connector 3:92 Hive Connector 3:92 doBeforePreExecution
DEBUG Hive Connector 3:92 Hive Connector 3:92 has new state: PREEXECUTE
DEBUG Hive Connector 3:92 Adding handler 9a6f00c7-abc5-4732-8f43-883d83d428f1 (Hive Connector 3:92: ) - 1 in total
DEBUG Hive Connector 3:92 Hive Connector 3:92 doBeforeExecution
DEBUG Hive Connector 3:92 Hive Connector 3:92 has new state: EXECUTING
DEBUG Hive Connector 3:92 Hive Connector 3:92 Start execute
DEBUG Hive Connector 3:92 New database session: DefaultDBSessionInformation(id=516726fa-0b16-4992-8397-5348808478a0, dbType=DBType(id=hive, name=Hive, description=Hive), driverDefinition=DBDriverDefinition(id=hivejdbc312, name=hive-jdbc-3.1.2, version=3.1.0, driverClass=org.apache.hive.jdbc.HiveDriver, dbType=DBType(id=hive, name=Hive, description=Hive), description=, origin=USER), connectionController=org.knime.database.connection.UserDBConnectionController=(url=jdbc:hive2://server01:2181,server02:2181,server03:2181:2181/default, authenticationType=USER_PWD, user=hive, password=true), dialectId=hive, attributeValues={knime.db.connection.jdbc.properties=DerivableProperties(map={serviceDiscoveryMode=“zookeeper” (LITERAL), zookeeperNamespace=“hiveserver2” (LITERAL)}, defaults=null)})
INFO Hive Connector 3:92 Could not create connection to database using URL: jdbc:hive2://server01:2181,server02:2181,server03:2181:2181/default and parameters: [password, serviceDiscoveryMode, user, zookeeperNamespace]. Exception: org/apache/hive/service/rpc/thrift/TCLIService$Iface
DEBUG Hive Connector 3:92 reset
ERROR Hive Connector 3:92 Execute failed: org/apache/hive/service/rpc/thrift/TCLIService$Iface
DEBUG Hive Connector 3:92 Hive Connector 3:92 doBeforePostExecution
DEBUG Hive Connector 3:92 Hive Connector 3:92 has new state: POSTEXECUTE
DEBUG Hive Connector 3:92 Hive Connector 3:92 doAfterExecute - failure
DEBUG Hive Connector 3:92 reset
DEBUG Hive Connector 3:92 clean output ports.
DEBUG Hive Connector 3:92 Removing handler 9a6f00c7-abc5-4732-8f43-883d83d428f1 (Hive Connector 3:92: ) - 0 remaining
DEBUG Hive Connector 3:92 Hive Connector 3:92 has new state: IDLE
DEBUG Hive Connector 3:92 Configure succeeded. (Hive Connector)
DEBUG Hive Connector 3:92 Hive Connector 3:92 has new state: CONFIGURED
DEBUG Hive Connector 3:92 Using_Spark_File_to_HIVE_Kyuubi 3 has new state: IDLE

Thanks.

Hi @mjkim,

Never tried this with KNIME, internals of a cluster usually not reachable from the outside.

If this is a test system, maybe add Apache KNOX to simulate a cluster?

Not tested, but this might be another workaround: You can edit the URL template in the preferences of the Cloudera driver, and use the URL with the zookeeper endpoints instead of the template.

Cheers,
Sascha

Hi @sascha.wolke,

I tried it by modifying the URL Template as you said, and the result was the same.:frowning:

Thanks.

Looking at this error it might be worth checking if you have all the necessary drivers. Based on my own searching it may be due to the Java libraries installed with KNIME don’t support zookeep connections. See a similar Stackoverflow thread here:

And here:

1 Like

Hi @nfriesen,

I successfully connected via the Hive JDBC Driver (hive-jdbc-3.1.3.jar) and Complie Dependencies files from the Maven Repository.

Best Regards,
MJ

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.