I have a problem with disk space during execution of my workflow. I have a dataset of 85Gb (it’s very huge as for me ) and only 350Gb free space on my device. So execution stops not far from the end because of lack of disk space. Is there any possibility not to store results of some nodes in memory (in temporary files)? For example, I have several nodes to preprocess my dataset and I don’t need to store keep results of all of these nodes. I need the result of the last node of this group. Is there way to deal with it somehow?
Also you might want to consider other options. Also a tweaking of data storage might be an option. Parquet and ORC file formats offer a good combination of compression and keeping of one’s column types. And also they allow to handle larger files in chunks but also treat them as one (big) table if necessary.