Spark Context Livy Node source code

#1

Hi,

I am trying to use the IBM Analytics Engine by creating a Spark Context from KNIME and it doesn’t work, I tried all the HTTP(S), httpFS, HDFS* nodes and I have this error: “Node : Create Spark Context (Livy) : 0:2634 : Execute failed: Unsupported remote file system: https (IllegalArgumentException)”.

Now I am trying to edit one of those node above and I am not able to compile them from the Eclipse, I tried using KNIME 3.7 and KNIME 4.0 SDK by using the setups from [1]. I am able to see the source code for the nodes but I am not able to import them as projects and them to replace them in the existing nodes.

Thanks,
Mihai

[1] https://docs.knime.com/2019-06/analytics_platform_new_node_quickstart_guide/index.html#_introduction

0 Likes

#2

Hi @mihais1

the Create Spark Context (Livy) node needs a remote file system to ship temporary files between Spark (inside the cluster) and KNIME (outside the cluster).

The reason for the error you are getting is that the “Create Spark Context (Livy)” node does not support HTTP(S) as a remote file system, for good reasons: HTTP lacks some file system capabilites (creating directories, listing directories) that the Livy node requires.

I usually suggest to use the “HttpFS Connection” node, as you only need the HttpFS service on the cluster end, but it seems that IBM Analytics Engine does not include that. What you could do is

  • either write a KNOX Connection node
  • or write a IBM Cloud Object Storage node (which I guess is the preferred way to store data in IBM Analytics Engine.

Support for the respective remote file system protocol of your connector node then only needs to be added to the Livy node, which should be pretty straightforward (allowing the “protocol”, provide logic for choosing a staging area).

If you choose to write a connector for IBM Cloud Object Storage, the knime-cloud repo provides some starting points, as it implements the S3 connector which should be similar in some respects:

Currently the source code for the Spark integration can only be obtained by installing the source bundle in KNIME. You could use this as a starting point for your own version.

Best,
Björn

0 Likes