Spark context: How to cache the intermediate data and not write it back while doing a series of transformations on the data

mlauber71 · February 7, 2020, 6:32pm

One possibility is to persist and unpersist intermediate results. This can be done in memory or on disk. This is if you want to use some results in a loop or fork them spark would not have to do everything all over again.

But there is no free lunch. Everything needs RAM Time and space.

You could unpersist the spark workflow after you are done.