Getting 401 error while trying to access Databricks from KNIME

Hi,

I am trying to connect KNIME with Databricks.

After fetching secrets from the keyvault, request is sent with secrets to fetch token to access the Databricks.
I used Create Databricks Environment node, and after passing required data to the node, I am getting 401 error.

KNIME logs:

 ERROR : KNIME-Worker-5-Create Databricks Environment 3:1 :  : Node : Create Databricks Environment : 3:1 : Execute failed: Could not open client transport with JDBC Uri: jdbc:hive2://db-url.azuredatabricks.net:443/default;ssl=true;httpPath=sql/protocolv1/o/0/clusterId;transportMode=http: Could not create http connection to jdbc:hive2://db-url.azuredatabricks.net:443/default;ssl=true;httpPath=sql/protocolv1/o/0/clusterId;transportMode=http. HTTP Response code: 401
java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://adb-db-url.azuredatabricks.net:443/default;ssl=true;httpPath=sql/protocolv1/o/0/clusterId;transportMode=http: Could not create http connection to jdbc:hive2://adb-db-url.azuredatabricks.net:443/default;ssl=true;httpPath=sql/protocolv1/o/0/clusterId;transportMode=http. HTTP Response code: 401
	at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:231)
	at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:168)
	at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
	at org.knime.database.connection.UrlDBConnectionController$ControlledDriver.connect(UrlDBConnectionController.java:95)
	at org.knime.database.connection.UrlDBConnectionController.createConnection(UrlDBConnectionController.java:308)
	at org.knime.database.connection.UserDBConnectionController.createConnection(UserDBConnectionController.java:249)
	at org.knime.bigdata.database.databricks.DatabricksDBConnectionController.createConnection(DatabricksDBConnectionController.java:118)
	at org.knime.database.connection.AbstractConnectionProvider.createConnection(AbstractConnectionProvider.java:89)
	at org.knime.database.connection.impl.DBConnectionManager.lambda$2(DBConnectionManager.java:501)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException: Could not create http connection to jdbc:hive2://adb-db-url.azuredatabricks.net:443/default;ssl=true;httpPath=sql/protocolv1/o/0/clusterId;transportMode=http. HTTP Response code: 401
	at org.apache.hive.jdbc.HiveConnection.createHttpTransport(HiveConnection.java:275)
	at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:201)
	... 12 more
Caused by: org.apache.thrift.transport.TTransportException: HTTP Response code: 401
	at org.apache.thrift.transport.THttpClient.flushUsingHttpClient(THttpClient.java:262)
	at org.apache.thrift.transport.THttpClient.flush(THttpClient.java:313)
	at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:73)
	at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62)
	at org.apache.hive.service.cli.thrift.TCLIService$Client.send_OpenSession(TCLIService.java:150)
	at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:142)
	at org.apache.hive.jdbc.HiveConnection.createHttpTransport(HiveConnection.java:267)
	... 13 more

Can someone please help me in this issue?

Hi @sascha.wolke,

You had provided a solution earlier Setting up token value through FlowVariable in DBFS node - #6 by sascha.wolke on setting the token to DBFS node.

Can you please take a look into this issue and let us know if there is a solution.

Regards,
Mathi.

UPDATE on this issue:

I need to use AAD authentication between JDBC databricks driver on KNIME to databricks cluster.
As per the documentation, to use AAD token (Workspace cluster), The URL template should be like
jdbc:spark://databricks.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/xxxxxxxx;AuthMech=11;Auth_Flow=0;Auth_AccessToken=AAD_TOKEN

When I check the logs of KNIME , the template / URL is like below
jdbc:spark://databricks.azuredatabricks.net:443/default and parameters: [UID, ssl, user, password, httpPath, transportMode, PWD, AuthMech]

we never know what is the value of AuthMech is used and how credentials FV value is mapped to user, password or PWD

I see that KNIME need to make changes in URL template of the driver to access Workspace cluster using AAD_Token. Currently it supports only personal access token/ user name & Password, which is not recommended way to use in production.

Is there any workaround please suggest. It is a blocker.

Hi @mathi,

the Create Databricks Environment node supports only personal access tokens, not the Azure AD tokens. I do not have an Azure AD setup right now, but you can try the following:

Download and register the Databricks JDBC driver in KNIME with the following URL template:

jdbc:spark://<host>:<port>/default;AuthMech=11;Auth_Flow=0

More informations can be found in the Database Exension Guide.

In the configuration dialog of the Create Databricks ENV node, go to DB Port and then the JDBC Parameters tab and add a new Parameter Auth_AccessToken with your token or a flowvariable to set the token.

As already mention, I did not have a setup to test this right now and It depends on the driver if it uses the AuthMech from the URL or parameters.

That’s true, user name and passwort is not recommended, but personal access tokens should do the job in production.

Hi @sascha.wolke,

Thanks for your reply
The suggested solution make sense. Would like to know how to restrict few parameters that are passed in JDBC URL.

As per your inputs, changed the URL template to

jdbc:spark://<host>:<port>/default;AuthMech=11;Auth_Flow=0

and image

After execution in logs I see that URL to connect to databricks is
` > jdbc:spark://databricks.azuredatabricks.net:443/default;AuthMech=11;Auth_Flow=0;UID=token;ssl=1;httpPath=sql/protocolv1/o/0/clusterId;transportMode=http;PWD=AAD_TOKEN;
> AuthMech=3;Auth_AccessToken=AAD_TOKEN and parameters: [user, password]. Exception:

Error:
[Simba]SparkJDBCDriver Communication link failure. Failed to connect to server. Reason: HTTP Response code: 401, Error message: Unknown.`

My question is how to avoid the extra parameters that are injected into URL by default.

As per documentation, I need to have URL as follows -

jdbc:spark://databricks.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/databricks/clusterId;AuthMech=11;Auth_Flow=0;Auth_AccessToken=AD token

Any suggestions?