I have followed all the steps in the installation document and installed the spark job server on CDH and also the clients side extension on Analytic platform. On CDH side I have configured Spark to run on YARN and on KNIME side I have configured the preferences accordinly (attached screenshot) and also changed this line in environment.conf file master = "yarn-client"
When I run an example workflow with spark and its gives me error: "Execute failed: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x" .
I know this error is because the root does not have permission on /user directory and hdfs is the owner so,
1) How can I connect to spark with a different username apart from "root" from KNIME ? (i donot have kerberos enabled on CDH ) I do not want to change the owner of the user directory.
2) Is there anything that I'm missing to configure. ?
it seems you have started spark-job-server as root, which is why it is trying to access the HDFS directory /user/root (which does not exist).
Please stop the jobserver, and delete the following directories:
/tmp/spark-job*
/var/log/spark-job-server
If you have followed the steps on page 4 of the manual you should have:
an OS user "spark-job-server"
a directory /opt/spark-job-server_0.6.2.1-KNIME_cdh-5.8 that belongs to the OS user spark-job-server
a symblic link /opt/spark-job-server -> /opt/spark-job-server_0.6.2.1-KNIME_cdh-5.8
a symbolic link /etc/init.d/spark-job-server -> /opt/spark-job-server/spark-job-server-init.d
a HDFS homedir /user/spark-job-server which belongs to user spark-job-server
If you haven't please do, because these steps are essential. Now start the jobserver again with the provided init.d script (do not use server_start.sh!).
Thanks for the sugessions I was using server_start.sh not sure why that will cause a problem. But,
I reinstalled the job server on hadoop and I'm tryign to start the service through init.d script. It gives me the following error:
root@cluster-01:/sbin# /opt/spark-job-server/spark-job-server-init.d start
/opt/spark-job-server/spark-job-server-init.d: line 8: /etc/init.d/functions: No such file or directory
Starting Spark Job-Server: /opt/spark-job-server/spark-job-server-init.d: line 31: checkpid: command not found
/opt/spark-job-server/spark-job-server-init.d: line 39: /sbin/runuser: No such file or directory
/opt/spark-job-server/spark-job-server-init.d: line 51: failure: command not found
I do not want to edit the init.d script, how do i go about resolving these errors ?
just for completeness and to publicy document how this issue was resolved:
Starting the Spark Jobserver must be done through the boot scripts which we provide. If you follow the installation guide PDF (see [1]) you should have it installed under /etc/init.d/spark-job-server
We currently provide boot scripts for RedHat 6.x/7.x, CentOS 6.x/7.x and Ubuntu 14 (SysV boot).
Starting and stopping the jobserver can the be done via the /etc/init.d/spark-job-server script (more details are also in [1]).