Getting 401 error while trying to access Databricks from KNIME

mathi · February 18, 2021, 9:45pm

Hi,

I am trying to connect KNIME with Databricks.

After fetching secrets from the keyvault, request is sent with secrets to fetch token to access the Databricks.
I used Create Databricks Environment node, and after passing required data to the node, I am getting 401 error.

KNIME logs:

 ERROR : KNIME-Worker-5-Create Databricks Environment 3:1 :  : Node : Create Databricks Environment : 3:1 : Execute failed: Could not open client transport with JDBC Uri: jdbc:hive2://db-url.azuredatabricks.net:443/default;ssl=true;httpPath=sql/protocolv1/o/0/clusterId;transportMode=http: Could not create http connection to jdbc:hive2://db-url.azuredatabricks.net:443/default;ssl=true;httpPath=sql/protocolv1/o/0/clusterId;transportMode=http. HTTP Response code: 401
java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://adb-db-url.azuredatabricks.net:443/default;ssl=true;httpPath=sql/protocolv1/o/0/clusterId;transportMode=http: Could not create http connection to jdbc:hive2://adb-db-url.azuredatabricks.net:443/default;ssl=true;httpPath=sql/protocolv1/o/0/clusterId;transportMode=http. HTTP Response code: 401
	at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:231)
	at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:168)
	at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
	at org.knime.database.connection.UrlDBConnectionController$ControlledDriver.connect(UrlDBConnectionController.java:95)
	at org.knime.database.connection.UrlDBConnectionController.createConnection(UrlDBConnectionController.java:308)
	at org.knime.database.connection.UserDBConnectionController.createConnection(UserDBConnectionController.java:249)
	at org.knime.bigdata.database.databricks.DatabricksDBConnectionController.createConnection(DatabricksDBConnectionController.java:118)
	at org.knime.database.connection.AbstractConnectionProvider.createConnection(AbstractConnectionProvider.java:89)
	at org.knime.database.connection.impl.DBConnectionManager.lambda$2(DBConnectionManager.java:501)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException: Could not create http connection to jdbc:hive2://adb-db-url.azuredatabricks.net:443/default;ssl=true;httpPath=sql/protocolv1/o/0/clusterId;transportMode=http. HTTP Response code: 401
	at org.apache.hive.jdbc.HiveConnection.createHttpTransport(HiveConnection.java:275)
	at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:201)
	... 12 more
Caused by: org.apache.thrift.transport.TTransportException: HTTP Response code: 401
	at org.apache.thrift.transport.THttpClient.flushUsingHttpClient(THttpClient.java:262)
	at org.apache.thrift.transport.THttpClient.flush(THttpClient.java:313)
	at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:73)
	at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62)
	at org.apache.hive.service.cli.thrift.TCLIService$Client.send_OpenSession(TCLIService.java:150)
	at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:142)
	at org.apache.hive.jdbc.HiveConnection.createHttpTransport(HiveConnection.java:267)
	... 13 more

Can someone please help me in this issue?

mathi · February 18, 2021, 11:03pm

Hi @sascha.wolke,

You had provided a solution earlier Setting up token value through FlowVariable in DBFS node - #6 by sascha.wolke on setting the token to DBFS node.

Can you please take a look into this issue and let us know if there is a solution.

Regards,
Mathi.

mathi · February 19, 2021, 1:26pm

UPDATE on this issue:

I need to use AAD authentication between JDBC databricks driver on KNIME to databricks cluster.
As per the documentation, to use AAD token (Workspace cluster), The URL template should be like
jdbc:spark://databricks.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/xxxxxxxx;AuthMech=11;Auth_Flow=0;Auth_AccessToken=AAD_TOKEN

When I check the logs of KNIME , the template / URL is like below
jdbc:spark://databricks.azuredatabricks.net:443/default and parameters: [UID, ssl, user, password, httpPath, transportMode, PWD, AuthMech]

we never know what is the value of AuthMech is used and how credentials FV value is mapped to user, password or PWD

I see that KNIME need to make changes in URL template of the driver to access Workspace cluster using AAD_Token. Currently it supports only personal access token/ user name & Password, which is not recommended way to use in production.

Is there any workaround please suggest. It is a blocker.

sascha.wolke · February 22, 2021, 4:10pm

Hi @mathi,

the Create Databricks Environment node supports only personal access tokens, not the Azure AD tokens. I do not have an Azure AD setup right now, but you can try the following:

Download and register the Databricks JDBC driver in KNIME with the following URL template:

jdbc:spark://<host>:<port>/default;AuthMech=11;Auth_Flow=0

More informations can be found in the Database Exension Guide.

In the configuration dialog of the Create Databricks ENV node, go to DB Port and then the JDBC Parameters tab and add a new Parameter Auth_AccessToken with your token or a flowvariable to set the token.

As already mention, I did not have a setup to test this right now and It depends on the driver if it uses the AuthMech from the URL or parameters.

That’s true, user name and passwort is not recommended, but personal access tokens should do the job in production.

mathi · February 22, 2021, 6:10pm

Hi @sascha.wolke,

Thanks for your reply
The suggested solution make sense. Would like to know how to restrict few parameters that are passed in JDBC URL.

As per your inputs, changed the URL template to

jdbc:spark://<host>:<port>/default;AuthMech=11;Auth_Flow=0

and

After execution in logs I see that URL to connect to databricks is
` > jdbc:spark://databricks.azuredatabricks.net:443/default;AuthMech=11;Auth_Flow=0;UID=token;ssl=1;httpPath=sql/protocolv1/o/0/clusterId;transportMode=http;PWD=AAD_TOKEN;
> AuthMech=3;Auth_AccessToken=AAD_TOKEN and parameters: [user, password]. Exception:

Error:
[Simba]SparkJDBCDriver Communication link failure. Failed to connect to server. Reason: HTTP Response code: 401, Error message: Unknown.`

My question is how to avoid the extra parameters that are injected into URL by default.

As per documentation, I need to have URL as follows -

jdbc:spark://databricks.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/databricks/clusterId;AuthMech=11;Auth_Flow=0;Auth_AccessToken=AD token

Any suggestions?

sascha.wolke · March 5, 2021, 12:44pm

Hi @mathi,

we released KNIME 4.2.5 that contains a fix on this and now allows the user to overwrite the AuthMech parameter using the JDBC Parameters tab as a workaround. You have to use the Databricks JDBC driver, add the three parameters to the tab and you should be able to connect using Azure AD tokens: AuthMech=11, Auth_Flow=0, Auth_AccessToken=<token>

mathi · March 5, 2021, 5:31pm

Hi @sascha.wolke,

Thanks for letting us know about the fix.

Tried to connect to databricks using Databricks Driver and setup the parameters using JDBC parameter tab.

Driver tab:

token flow variable

JDBC parameter tab with AAD token

AP:

The response from the node:

If I use Personal access token instead of AAD token with above setup, it is working fine.
The AAD token was not expired one (verified it).

Can you please share me a workflow or screen shot of successful scenario where you have achieved to connect to databricks with AAD token, it will be really helpful?

sascha.wolke · March 5, 2021, 6:33pm

Hi @mathi,

remove all parameters from the URL template: jdbc:spark://<host>:<port>/default
Then add the new parameters to the JDBC properties tab:

Does this work?

mathi · March 5, 2021, 6:46pm

Hi @sascha.wolke,

Removed the parameters from URL template

Error

sascha.wolke · March 9, 2021, 2:18pm

Hi @mathi,

looks like you now get an 400 error instead of 401? Can you double check the following:

Update/install KNIME 4.2.5 or 4.3.2
Get a fresh Azure AD token
Create a credentials variable using token as username and the Azure AD token as password
Configure the Create Databricks ENV node like this:

mathi · March 9, 2021, 3:25pm

Hi @sascha.wolke,

Found what is the issue, workspace was empty.

sascha.wolke · March 10, 2021, 10:11am

Hi @mathi,

does it work now?

mathi · March 10, 2021, 10:29am

Hi @sascha.wolke,

In local it is working fine as expected. Need to check it on server as well by installing the databricks driver on the server.

mathi · March 10, 2021, 11:05am

Hi @sascha.wolke,

in .epf file, below content are added

The JDBC parameters values are added into .epf file based on the values that we pass in JDBC parameters tab of create databricks node

When I execute the WF on the server, I am getting below error.

In profile list, the new profile has been added

Is there anything I am missing here?
I did restart of server as well.

tobias.koetter · March 10, 2021, 2:00pm

Hello mathi,
the additional JDBC properties are only necessary when the parameter needs to be enforced when the node is executed on the server as described in the DB documentation. So in your case I would remove the additional properties since they are anyway specified in the workflow and don’t need to be overwritten on the server.
Bye
Tobias

mathi · March 10, 2021, 3:31pm

Hi @tobias.koetter,

It make sense to have the additional parameters in the workflow, instead of .epf file.

@sascha.wolke I had tried to connect to databricks using AAD token from local and on the server, I am able to get connected. Thank you for your help

system · March 17, 2021, 3:31pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.