I have a workflow which uses the local big data environment. The workflow was executed and saved.
But when I open the saved workflow, in order to re execute it, it complains about the (lost?) spark context:
ERROR Spark to Table 0:15 Execute failed: Spark context ‘sparkLocal://knimeSparkContext’ does not exist in the cluster. Please create a context first.
You can fix this just by resetting and re executing the “Create Big Data Local environment” node, but of course it demands a lot of time because of the downstream nodes reexecution.
the Create Local Big Data Environment has an option called Action to perform on dispose. This option controls if the Spark context will be destroyed after closing the workflow, but this does not help if you restart KNIME. The spark session can’t be persisted to disk, only the data transferred down to KNIME can be persisted. That’s why you need to launch a new Spark session after restarting KNIME and all nodes depending on the Spark session must be restarted too.
This is due to the way Spark works which is mainly in the memory of the system which makes it fast but also temporary.
It could make sense to familiarise yourself with some key concepts like lazy evaluation since they heavily influence how spark works and what you will encounter once you start using it with KNIME not least persist and unpersist.