ERROR PySpark Script (1 to 1) 0:9 Execute failed:

Greetings,
I recently tried pyspark node on Knime using a spark 2.3 hortonworks.

Previously I got problems with numpy, but now I can run it if using no modifications.
However, when I do something as simple as this:
on the custom code:
df = pd.DataFrame(np.random.randn(100, 4), columns=list(‘ABCD’))
resultDataFrame1 = df

I get the following errors…

ERROR PySpark Script (1 to 1) 0:9 Execute failed: /data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/pyspark.zip/pyspark/context.py:243: RuntimeWarning: Failed to add file [file:/usr/hdp/current/spark2-client/python/lib/pyspark.zip] speficied in ‘spark.submit.pyFiles’ to Python path:
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/tmp
/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/spark-64194fce-63dc-4bb0-b682-2e9d49f3d4eb/userFiles-3c6999ba-d6b4-4271-a221-8ccc0f81f690
/data/hadoop/yarn/local/filecache/12/spark2-hdp-yarn-archive.tar.gz/spark-core_2.11-2.3.2.3.1.0.0-78.jar
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001
/usr/lib/python2.7
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/pyspark.zip
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/py4j-0.10.7-src.zip
/usr/lib64/python27.zip
/usr/lib64/python2.7
/usr/lib64/python2.7/plat-linux2
/usr/lib64/python2.7/lib-tk
/usr/lib64/python2.7/lib-old
/usr/lib64/python2.7/lib-dynload
/usr/lib64/python2.7/site-packages
/usr/lib64/python2.7/site-packages/gtk-2.0
/usr/lib/python2.7/site-packages
RuntimeWarning)
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/pyspark.zip/pyspark/context.py:243: RuntimeWarning: Failed to add file [file:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip] speficied in ‘spark.submit.pyFiles’ to Python path:
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/tmp
/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/spark-64194fce-63dc-4bb0-b682-2e9d49f3d4eb/userFiles-3c6999ba-d6b4-4271-a221-8ccc0f81f690
/data/hadoop/yarn/local/filecache/12/spark2-hdp-yarn-archive.tar.gz/spark-core_2.11-2.3.2.3.1.0.0-78.jar
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001
/usr/lib/python2.7
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/pyspark.zip
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/py4j-0.10.7-src.zip
/usr/lib64/python27.zip
/usr/lib64/python2.7
/usr/lib64/python2.7/plat-linux2
/usr/lib64/python2.7/lib-tk
/usr/lib64/python2.7/lib-old
/usr/lib64/python2.7/lib-dynload
/usr/lib64/python2.7/site-packages
/usr/lib64/python2.7/site-packages/gtk-2.0
/usr/lib/python2.7/site-packages
RuntimeWarning)
Traceback (most recent call last):
File “/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/tmp/pythonScript_9ac6c966_72f5_4b2a_9567_55f60ed7b3944900562323152859992.py”, line 68, in
_exchanger.addDataFrame(“76e6bd2f-22b5-46e3-aa1c-344996d15a32_resultDataFrame1”,resultDataFrame1)
File “/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/tmp/pythonScript_9ac6c966_72f5_4b2a_9567_55f60ed7b3944900562323152859992.py”, line 31, in addDataFrame
jdf = _py2java(self._spark, df)
File “/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/pyspark.zip/pyspark/mllib/common.py”, line 88, in _py2java
File “/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py”, line 1257, in call
File “/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/pyspark.zip/pyspark/sql/utils.py”, line 63, in deco
File “/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/py4j-0.10.7-src.zip/py4j/protocol.py”, line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.mllib.api.python.SerDe.loads.
: net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)
at net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
at org.apache.spark.mllib.api.python.SerDeBase.loads(PythonMLLibAPI.scala:1321)
at org.apache.spark.mllib.api.python.SerDe.loads(PythonMLLibAPI.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:745)

Anything I need to do to resolve this?

Thank you in advance, best regards,
Mizu

Hey @Mizunashi92
that looks like a configuration issue in the cluster. Is the path to the pyspark.zip correct? Did you configure the path on the cluster or in the Livy Context node?
I would recommend to set the PYTHONPATH on the cluster in the yarn configuration.

best regards Mareike