Can we run PySpark code thru Knime ?

Hi Team,

We have few PySpark scripts actually running on a Hadoop cluster.  We would like to know if we have facility to run PySpark scripts (which uses sql concepts of Pyspark) on Knime. Knime supports Spark connectors, but can we run Pyspark scripts ?

If yes, can you please forward use case documentation or any documentation, which are implemented using Spark and PySpark.

Thanks in advance.

Hello,

currently it is not possible to execute PySpark scripts from within KNIME. However KNINE uses the open source Spark Job Server for the job management on the cluster side and their is already a discussion to support Python jobs as well. Once this is supported you could use the REST nodes in KNIME to manage the execution of your PySpark jobs from within KNIME via the REST API of the Spark Job Server.

Bye

Tobias

Hi Tobias.koetter,

Thanks for you prompt repsonse.

My objective is to trigger Spark jobs (pySpark jobs) on Spark job server. The discussion you have mentioned does not confirm, that the support for Python is enabled. Could you please confirm, if there any alternative to execute spark jobs (pyspark jobs) on Spark job server.

Thanks in advance.

Hi,

the master of the job server repository has some example python jobs you might want to have a look at. However the Job Server KNIME is using is based on the 0.6.2 branch of the job server which does not support Python jobs. If you need more information about running Python jobs with the job server you could ask in the mentioned Python thread in the GIthub repository for more details.

Bye

Tobias