Problem with non-existent pickle port in Pyspark node

Hello. KNIME Support Team!

I am currently working with a client on a KNIME collaboration and have an issue with Pyspark.

In the previous KNIME collaboration, ML analysis was conducted using Python Script, and the Pickle file containing the information of the derived analysis model was stored in the customer’s MinIO (Object Storage) as a “Model Writer” node using the “Pickled object” port in the output port of the Python Script.

However, in the current collaboration, the same requirement is implemented in Pyspark, but the problem is that the Pyspark script can only output a table as an output port. We need to save the Pickle file derived from Pyspark to our client’s MinIO, is there any way?

I’ve been thinking about it for a long time, but I don’t have a relative situation yet, so I’m asking for your help.

We are in a hurry, so a quick response would be very much appreciated.

Hi @JaeHwanChoi,

You might like to use the KNIME Spark Nodes to create your model.

If you still like to use PySpark, you can find the documentation here: PySpark Overview — PySpark 3.4.1 documentation

We need to save the Pickle file derived from Pyspark

You have to adapt your code to PySpark. The PySpark models can be stored on s3/MinIO using the usual write method, and you don’t need any pickeld output.

Cheers,
Sascha

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.