Slow Python Node execution

ActionAndi · November 18, 2024, 1:08pm

Hi there,

I’m using python nodes quite a lot in my Workflows and sometimes even in loop constructions. I’m wondering why the execution of a python node takes so long even when only a few number crunching is done. For me it seems that the initialisation of the python environment behind it takes the most of the execution time.

For example:

# KNIME settings 
import knime.scripting.io as knio 
df = knio.input_tables[0].to_pandas() 

# END of KNIME settings 
# The usual suspects
import pandas as pd
import numpy as np


# core script
df = df.groupby('a').sum('c')
df['d'] = np.sqrt(df['c']) + 23


# Postptrocessing
df1 = df.copy()

 # KNIME Postprocessing Settings 
knio.output_tables[0] = knio.Table.from_pandas(df1) 
  # KNIME Postprocessing Settings

Takes about 4 seconds when it’s filled with a table of 35 rows and 3 columns (string + 2 number columns).

ActionAndi · November 19, 2024, 7:52am

Did some testing this morning according to different python settings in KNIME:

I ran the python script shown above 100 times and measured the timing. Once with the option “conda”, once with the option “bundled” in the KNIME settings.

This is the result. It seems that the runtime increases by the number of calls when using “conda” option.

mlauber71 · November 19, 2024, 8:11am

@ActionAndi conda will use a Python environment that runs in a separate installation and will contain the packages that are there. The bundled Python version is integrated into KNIME and the data exchange is via Arrow Tables.

What version of KNIME are you using and how much RAM have you allocated to KNIME?

ActionAndi · November 19, 2024, 8:44am

In my company we use verwsion 5.2.3.

My settings are:

ActionAndi · November 19, 2024, 10:21am

I did some more experiments on this:

Freshly started KNIME, all other workflows closed, PC runs on Win10 w/o any other users, I was logged off. PC has 32GB RAM, heap space for KNIME 24GB

So it seems that are three groups of runtimes:

regular: 800 ms
longer: ~2000 ms
even longer: ~3000 ms

From statistics there’s no difference if I use the external Conda environment or the bundled one.
But what is the root cause for this long runners? Reload of the Python Environment?

system · February 17, 2025, 10:22am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.