some linux distributions (fedora) keep the /tmp directory in RAM which clashes with knime’s own disk caching, as cache files created by knime in /tmp will again end up in RAM. The obvious mitigation for this is to set the directory to /var/tmp which is on disk. However, it seems that at least the FAISS Vector Store Creator (if not all Python-based nodes) ignores this setting and still writes its output to /tmp. In my case, a bunch of pyscript_console_output178747746567088984.arrow files end up in /tmp.
Is there any way to steer the caching behavior of python based nodes?
Edit: Further investigation, also known as checking the file sizes, leaves me a bit confused. Each file is just 1kb, so it’s likely not the reason I’m running out of RAM.
Do python based nodes cache to disk? As far as i remember, the python script node does so.
Hi @Ellison,
Thank you for the report. You are correct, neither the “Python Script” node nor Python-based nodes take the KNIME preference for the temporary folder into account for internal temporary files. I created tickets to fix that. Maybe the “FAISS Vector Store Creator” internally creates large temporary files. I checked back with the team and will let you know.
Input and output tables are cached to disk and they use the correct configured temporary folder.
Internal ticket ID: AP-23443 Summary: N-IX: Tempdir not set to KNIME configured temp for Python nodes and Python Script Fix version(s): 5.8.0 Other related open ticket(s):
AP-23442: Python Script console output not saved a configured temporary folder