Hello,
I’ve started using Python (Labs) due to the stable release Python Script node causing memory issues when using AsyncIO. I was hoping the new batching and conversion of input tables to pandas dataframes would solve the memory issues, however the Python (Labs) node still experiences errors caused by running out of memory when using a chunk size of more than 1,000 rows.
It seems to run out of memory when adding data to the output table (knio.batch_write_table()
). This will be due to the large response retrieved via the AioHttp library, along with joining the input table with the response data.
I understand the node is still in development, but I have a few questions:
- Could it be incompatible with asynchronous tasks?
- Why is it still running out of memory when accessing the inputs in batches?
- The docs say “previously the size of the input data was limited by the amount of RAM available on the machine, the Python Script (Labs) node can process arbitrarily large amounts of data by accessing it in batches via the
.batches()
method of the input table” - how does it handleoutput
data?
The error appears as:
Execute failed: Executing the Python script failed: Error while sending a command.
However the logs read:
kernel: [49007.279872] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/cron.service,task=python,pid=42220,uid=1000
kernel: [49007.279971] Out of memory: Killed process 42220 (python) total-vm:33175100kB, anon-rss:24284592kB, file-rss:1112kB, shmem-rss:0kB, UID:1000 pgtables:56592kB oom_score_adj:0
kernel: [49007.864132] oom_reaper: reaped process 42220 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
I have made a minimalist (as possible, but still not very minimal) workflow but it’s not replicating the issue when executed. This is because the website blocks the IP, and therefore the very large response is not received and so the node wont run out of memory. I mostly included the minimal workflow to ensure I’m not using the batches wrong, or that they don’t work the same with AsyncIO, or something.
Thanks for your help.
Minimal Batch Memory Error Workflow.knwf (163.8 KB)