Python based nodes ignore temporary directory settings?

Hi everyone,

some linux distributions (fedora) keep the /tmp directory in RAM which clashes with knime’s own disk caching, as cache files created by knime in /tmp will again end up in RAM. The obvious mitigation for this is to set the directory to /var/tmp which is on disk. However, it seems that at least the FAISS Vector Store Creator (if not all Python-based nodes) ignores this setting and still writes its output to /tmp. In my case, a bunch of pyscript_console_output178747746567088984.arrow files end up in /tmp.

Is there any way to steer the caching behavior of python based nodes?

Edit: Further investigation, also known as checking the file sizes, leaves me a bit confused. Each file is just 1kb, so it’s likely not the reason I’m running out of RAM.
Do python based nodes cache to disk? As far as i remember, the python script node does so.

Hi @Ellison,
Thank you for the report. You are correct, neither the “Python Script” node nor Python-based nodes take the KNIME preference for the temporary folder into account for internal temporary files. I created tickets to fix that. Maybe the “FAISS Vector Store Creator” internally creates large temporary files. I checked back with the team and will let you know.

Input and output tables are cached to disk and they use the correct configured temporary folder.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.