KNIME HDFS - Livy - Spark - Kerberos

Hi,

we are currently struggeling with connecting to a Kerberos enabled Livy endpoint using KNIME 4.0.2.
What we basically have on the server side is the following:

  • a KDC containing all Principals (HDFS, Livy and Test Principal for KNIME)
  • a Kerberos enabled HDFS setup (Namenode, Datanode…)
  • a Livy endpoint which is also Kerberos enabled

The connection to HDFS can be established using the Test Principal’s Keytab.
When trying to submit jobs to Livy the only error message being shown is

ERROR Create Spark Context (Livy) 0:1 Execute failed: Authentication required:

Error 401

HTTP ERROR: 401

Problem accessing /sessions/. Reason:

    Authentication required


Powered by Jetty://

When using curl (with --negotiate) after having called kinit using the exact same Keytab as used for KNIME it is possible to create a session in Livy.

We’ve been basically trying to follow all guides available (which are probably outdated…):

Did anybody ever accomplish doing something like this and is able to help?
Has this even been tested before?

Fun fact for everybody trying to use Livy 0.6.0. Stop trying, use Livy 0.5.0 as 0.6.0 is not supported by KNIME at all.

Thanks.

Hi @de123,
welcome to the KNIME community!
This setup should work fine, we are using it at KNIME all the time.
First thing to check would be if the Kerberos authentication is working correctly in KNIME. In the Kerberos preference page: File > Preferences > KNIME > Kerberos where you added the keytab to the configuration, are you able to login? There is a login button, that will try to do the login.
You should also see your current login status at the Kerberos icon in the bottom right corner of KNIME.

If that works, could you please check the settings in the Livy node: Did you choose Kerberos as authentication method?

PS.: Regarding the Fun fact: that is true up to KNIME 4.0. However we integrated Livy 0.6 support, you can test it with the current nightly build.

bets regards Mareike

Hi @mareike.hoeger,

thanks for the quick response.

As stated before, I was able to authenticate against Kerberized HDFS.
AFAIK KNIME at first checks that the connection the remote filesystem can be established before trying to connect to Livy.
In fact logging in via the Kerberos Prefences page was also working fine. (it was also working after activating the Kerberos status entry at the right bottom and using that one to login).

The Livy node is also configured to use Kerberos.

Regarding the 0.6.0 support: Does that mean that this will probably be supported with KNIME 4.1.0?

Best,
Daniel

Hi @de123,

there is a logging option in the Kerberos preferences page. Enable this with logging level debug and restart KNIME. Now have a look in the console or KNIME log. Did they show something unexpected?

Where did you find this stylish error page in KNIME?

Regarding Livy 0.6: Yes, the upcoming KNIME 4.1.0 release supports Livy 0.6.

Best,
Sascha

Hi @sascha.wolke,

I already had this one enabled and set to debug (Logging option).
I could not find anything helpful or unexpected in the log when execting it again.
See: knime_hdfs_livy_log.txt (11.5 KB)

That stylish error page is in fact just the HTML output Livy returns when an unauthorized user is trying to access the /sessions endpoint. Somehow this forum allows the usage of HTML tags.

Best,
Daniel

Hi @de123,

do you use DNS names or an IP address to connect to Livy? The logs shows a HDFS URL using an IP, this might work with HDFS, but not with Livy/SPNEGO.

Hi @sascha.wolke,

the Livy node is also configured to use an IP address.
Do I need to access it using its hostname or would it also work if I change the principal to, instead of containing the Kubernetes Cluster Internal hostname to contain the hosts IP address?

Hi @de123,

i guess the principals must match. In HDP/CDH setups, the principals use hostnames by default. I’m not sure if this works with IP addresses. The principals might be the problem.

@sascha.wolke,

thank you for your help.
I now changed the HTTP/ and livy/ principals to contain the host’s IP address.
That means when trying to access the Livy endpoint using the host’s IP the fqdn basically needs to be the IP address.

After doing this KNIME is able to start a session. Now another error occurs telling
requirement failed: Local path /root/.livy-sessions/a6b56ecd-417b-4dd9-99fd-77556a28118f/sparkClasses3209110133570466285.jar cannot be added to user sessions.

Hi @de123,

great that Kerberos now works. You need to set HDFS as default FS in Spark. I guess this is not the case? Does the path “/root” exists in HDFS? You might set another staging path in the Livy Node advanced tab, e.g. something in /tmp.

Hi @sascha.wolke,

I moved the Livy staging dir also to HDFS.
That helped by starting the Spark Executors.
They now stay up as expected.
Since KNIME is starting the Livy Session with kind: shared I got to modify the Spark Image a bit, but that’s a different story.

Thank you for the quick help!

Hi @de123,

KNIME does not set the kind and shared might be the default. Great to hear that your Kubernetes Kerberos setup now works. Feel free to mark an answer as solution.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.