Hi,
I’m trying to use PySpark Script nodes, but they’re not working because they’re requesting numpy on the Hadoop cluster.
Is this the issue?
Are there any other required modules?
Attached is the error.
Thanks a lot )))
Giorgio
[Ieri 12:52] FEDERICO CAMPANELLA
Traceback (most recent call last):
File “/hadoop/disk3/yarn/nm/usercache/sa_mkcho-svil/appcache/application_1707041036019_0365/container_e175_1707041036019_0365_01_000002/tmp/pythonScript_e6459943_5133_4314_a863_3e17b2ac1ca27354869988235983952.py”, line 3, in
from pyspark.mllib.common import _py2java, _java2py
File “/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p2000.37147774/lib/spark/python/lib/pyspark.zip/pyspark/mllib/init.py”, line 28, in
ImportError: No module named numpy