Hi there,
I’m using python nodes quite a lot in my Workflows and sometimes even in loop constructions. I’m wondering why the execution of a python node takes so long even when only a few number crunching is done. For me it seems that the initialisation of the python environment behind it takes the most of the execution time.
For example:
# KNIME settings
import knime.scripting.io as knio
df = knio.input_tables[0].to_pandas()
# END of KNIME settings
# The usual suspects
import pandas as pd
import numpy as np
# core script
df = df.groupby('a').sum('c')
df['d'] = np.sqrt(df['c']) + 23
# Postptrocessing
df1 = df.copy()
# KNIME Postprocessing Settings
knio.output_tables[0] = knio.Table.from_pandas(df1)
# KNIME Postprocessing Settings
Takes about 4 seconds when it’s filled with a table of 35 rows and 3 columns (string + 2 number columns).
Did some testing this morning according to different python settings in KNIME:
I ran the python script shown above 100 times and measured the timing. Once with the option “conda”, once with the option “bundled” in the KNIME settings.
This is the result. It seems that the runtime increases by the number of calls when using “conda” option.
@ActionAndi conda will use a Python environment that runs in a separate installation and will contain the packages that are there. The bundled Python version is integrated into KNIME and the data exchange is via Arrow Tables.
What version of KNIME are you using and how much RAM have you allocated to KNIME?
In my company we use verwsion 5.2.3.
My settings are:
I did some more experiments on this:
Freshly started KNIME, all other workflows closed, PC runs on Win10 w/o any other users, I was logged off. PC has 32GB RAM, heap space for KNIME 24GB
So it seems that are three groups of runtimes:
- regular: 800 ms
- longer: ~2000 ms
- even longer: ~3000 ms
From statistics there’s no difference if I use the external Conda environment or the bundled one.
But what is the root cause for this long runners? Reload of the Python Environment?