Pyspark Script Node Error

Hi Knime Team,

while working on custom PySpark Script(1 to 1) node I got an error which i am not able to sort out. kindly check the knime-log.txt file for your reference and one thing to note that my python script was working perfectly and I validated on Script Section(where i clicked on validate job on cluster button)

Image for Reference:

Knime Log Reference:
knime-log.txt (30.9 KB)

@Dhruv101 The error message indicates that this is a Java version problem. The error happens inside your cluster, not in KNIME AP, it just gets logged in KNIME AP because we are transferring cluster-side errors into the AP and logging them there (so that the problem is easier to diagnose).

It is a cluster-side where somehow there are two different versions of class “org.apache.spark.sql.execution.FileSourceScanExec” being used by the same Spark context in your cluster. This is what this error message says:

java.io.InvalidClassException: org.apache.spark.sql.execution.FileSourceScanExec; local class incompatible: stream classdesc serialVersionUID = 1920947604238219635, local class serialVersionUID = -3589590085483687218

I am not sure how this can happen, but it is not a problem with KNIME software, but a problem that needs to be fixed in your cluster setup. What type of cluster are you using? (Cloudera CDH, HDP, CDP, or Amazon EMR, …). If it is from one of the big vendors it might be worth contacting their support about it, or ask in their support forums.

Best,
Björn

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.