Simultaneous Table Reading from Hive

gmoran · March 3, 2017, 1:27pm

Hi,

Consider the following simple workflow:
1. Hive Connector
2. Database Table Selector
3. Database Connection Table Reader

It needs to run twice, each time with a different hive query
The Hive query is lengthy, so I duplicated the workflow for executing the queries in parallel.

However, this is not doable due to the foffowing error:
Table Reader : 2:423:376:415:37 : Execute failed: org.apache.thrift.TApplicationException: GetOperationStatus failed: out of sequence response.
(The call stack from knime.log is attached)

Am I missing anything?
Or this is a known limitation?

Thanks

Moran

call_stack.txt

bjoern.lohrmann · March 9, 2017, 4:49pm

Hi Moran,

my apologies for the late reply. This is a known issue with the Hive JDBC driver, which cannot handle parallel queries over the same connection.

As a workaround you could make two Hive Connector nodes with similar but not identical settings (hostname, JDBC parameters, user). The reason is that we cache JDBC connections over hostname JDBC Parameters user. However if you have Hive Connectors that differ in any of these three settings, they will be using different connections. You could try adding a JDBC parameter in one of the Hive Connectors and set it to its default value.

The Hive JDBC parameters are described here:

https://cwiki.apache.org/confluence/display/Hive/HiveJDBCInterface

For example could try setting transportMode=cliservice in on of the Hive Connectors.

Best,

Björn

gmoran · March 13, 2017, 4:58pm

Thank you.

Seem to solve it.

system · June 2, 2023, 9:03pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.