I am trying to connect Livy and MINIO to utilize the MINIO API within Pyspark to export a file to the MINIO specified path.
The file storage path I specified is ‘s3a://result_bucket/Check_reduce.log’, but when I run the Pyspark Script, I get the following error.
FileNotFoundError: [Errno 2] No such file or directory: ‘/opt/spark-3.0.1-bin-hadoop-3.2.0-cloud-scala-2.12/work -dir/s3a:/result_bucket/Check_reduce.log’
I can’t seem to find the path because it is automatically assigned the path “/opt/spark-3.0.1-bin-hadoop-3.2.0-cloud-scala-2.12/work -dir” which I didn’t even specify, do you have any idea why it is assigned? Also, what is the solution?
I’m really in a hurry. A quick answer would be appreciated.
Here is the problem code in the Pyspark Script I am using in connection with Livy.
file_path = ‘s3a://minio_result_bucket/make_file_log.txt’
with open(file_path, “w”) as my_file:
my_file.write(“Hello world \n”)
my_file.write(“I hope you’re doing well today \n”)
my_file.write(“This is a text file \n”)
my_file.write(“Have a nice time \n”)
This is example code, I want to create a file in minio, and put a log-like sentence in that file.
I’ve tested a number of things, and while it’s possible to simply export the finished file, it doesn’t seem to automatically create a file in the minio path.
When I run the code, it says it doesn’t find the path to file_path. Of course, I also applied the minio connection path as shown below.
After all, you can only export via the output port of Pyspark or Python Script, or export the finished file via the code “df.write.parquet(“s3a://minio_result_bucket/some-path.parquet”)”, right?
Your PySpark code is usually executed on some executor inside your cluster.
To write files, you can use the output port of the PySpark node and write the files using KNIME nodes. To improve the performance, don’t use the Spark to Table node, use a write node like Spark to Parquet to write your files.