ERROR PySpark Script (1 to 1) 0:9 Execute failed:

Greetings,
I recently tried pyspark node on Knime using a spark 2.3 hortonworks.

Previously I got problems with numpy, but now I can run it if using no modifications.
However, when I do something as simple as this:
on the custom code:
df = pd.DataFrame(np.random.randn(100, 4), columns=list(‘ABCD’))
resultDataFrame1 = df

I get the following errors…

ERROR PySpark Script (1 to 1) 0:9 Execute failed: /data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/pyspark.zip/pyspark/context.py:243: RuntimeWarning: Failed to add file [file:/usr/hdp/current/spark2-client/python/lib/pyspark.zip] speficied in ‘spark.submit.pyFiles’ to Python path:
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/tmp
/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/spark-64194fce-63dc-4bb0-b682-2e9d49f3d4eb/userFiles-3c6999ba-d6b4-4271-a221-8ccc0f81f690
/data/hadoop/yarn/local/filecache/12/spark2-hdp-yarn-archive.tar.gz/spark-core_2.11-2.3.2.3.1.0.0-78.jar
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001
/usr/lib/python2.7
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/pyspark.zip
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/py4j-0.10.7-src.zip
/usr/lib64/python27.zip
/usr/lib64/python2.7
/usr/lib64/python2.7/plat-linux2
/usr/lib64/python2.7/lib-tk
/usr/lib64/python2.7/lib-old
/usr/lib64/python2.7/lib-dynload
/usr/lib64/python2.7/site-packages
/usr/lib64/python2.7/site-packages/gtk-2.0
/usr/lib/python2.7/site-packages
RuntimeWarning)
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/pyspark.zip/pyspark/context.py:243: RuntimeWarning: Failed to add file [file:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip] speficied in ‘spark.submit.pyFiles’ to Python path:
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/tmp
/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/spark-64194fce-63dc-4bb0-b682-2e9d49f3d4eb/userFiles-3c6999ba-d6b4-4271-a221-8ccc0f81f690
/data/hadoop/yarn/local/filecache/12/spark2-hdp-yarn-archive.tar.gz/spark-core_2.11-2.3.2.3.1.0.0-78.jar
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001
/usr/lib/python2.7
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/pyspark.zip
/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/py4j-0.10.7-src.zip
/usr/lib64/python27.zip
/usr/lib64/python2.7
/usr/lib64/python2.7/plat-linux2
/usr/lib64/python2.7/lib-tk
/usr/lib64/python2.7/lib-old
/usr/lib64/python2.7/lib-dynload
/usr/lib64/python2.7/site-packages
/usr/lib64/python2.7/site-packages/gtk-2.0
/usr/lib/python2.7/site-packages
RuntimeWarning)
Traceback (most recent call last):
File “/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/tmp/pythonScript_9ac6c966_72f5_4b2a_9567_55f60ed7b3944900562323152859992.py”, line 68, in
_exchanger.addDataFrame(“76e6bd2f-22b5-46e3-aa1c-344996d15a32_resultDataFrame1”,resultDataFrame1)
File “/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/tmp/pythonScript_9ac6c966_72f5_4b2a_9567_55f60ed7b3944900562323152859992.py”, line 31, in addDataFrame
jdf = _py2java(self._spark, df)
File “/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/pyspark.zip/pyspark/mllib/common.py”, line 88, in _py2java
File “/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py”, line 1257, in call
File “/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/pyspark.zip/pyspark/sql/utils.py”, line 63, in deco
File “/data/hadoop/yarn/local/usercache/livy/appcache/application_1551675118833_0042/container_e22_1551675118833_0042_01_000001/py4j-0.10.7-src.zip/py4j/protocol.py”, line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.mllib.api.python.SerDe.loads.
: net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)
at net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
at org.apache.spark.mllib.api.python.SerDeBase.loads(PythonMLLibAPI.scala:1321)
at org.apache.spark.mllib.api.python.SerDe.loads(PythonMLLibAPI.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:745)

Anything I need to do to resolve this?

Thank you in advance, best regards,
Mizu

Hey @Mizunashi92
that looks like a configuration issue in the cluster. Is the path to the pyspark.zip correct? Did you configure the path on the cluster or in the Livy Context node?
I would recommend to set the PYTHONPATH on the cluster in the yarn configuration.

best regards Mareike

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.