Remote Spark Driver

Hi,
how-to avoid the following error, while trying to fecth 1000 rows from a query on a Hive Connector node (Apache Hive JDBC Driver [ID: hive]) configured (on the Cloudera cluster) to use Spark?
Thank you

Error during fetching data from the database: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 4927efe1-9fa9-42a2-b0eb-7f3d26868dc6_1: java.util.concurrent.TimeoutException: Client ‘4927efe1-9fa9-42a2-b0eb-7f3d26868dc6_1’ timed out waiting for connection from the Remote Spark Driver

PB

Hi @paobar61,
this could simply be an issue with a too small timeout in your cluster. Or might be due to issues with the resource allocation in your cluster.

First thing you can try is to increase the timeouts:
hive.spark.client.connect.timeout
hive.spark.client.server.connect.timeout

You should also look into the logs of your cluster maybe the can give more information on why the connection takes so long.

best regards Mareike

2 Likes

Thank you very much @mareike.hoeger, the problem is that we have a too-small cluster :wink:
Your suggestions are welcome anyway, thanks a lot!

PB

3 Likes