There is a plethora (too much?) of information about Python integration yet no simple explanation on the data structures coming in and out of the Python Script node. Despite my goal being as trivial as it can be imagined, I found nothing in the documentation.
I’m simply looking to use Python’s lgamma() function over the input, column ‘x’ from the input. I’ve seen dozens of examples but none that explain the error I’m getting.
Code:
import knime.scripting.io as knio
import math
df = knio.input_tables[0].to_pandas()
x = df['x']
df['ln_gamma(x)'] = math.lgamma(x)
knio.output_tables[0] = knio.Table.from_pandas(df)
Error:
TypeError: cannot convert the series to <class ‘float’>
I’m assuming that the output of df['x'] is a “series” (some Python data structure) but then how are those handled inside a Python script node? Should there be a loop somewhere?
The bizarre thing is, it I replace lgamma with some other function eg scipy.stats.norm then it works perfectly.
Around the error you are getting there should be the line number, which will indicate that this has nothing to do with the KNIME Python framework. The series are handled the exact same way as outside a Python Script node.
If in a terminal I open python* I can execute the following:
import math
import pandas as pd
df = pd.DataFrame()
df['x'] = [1,2,3] # This will be converted in the df to a pandas series, which is some sort of list, but special to pandas
df'ln_gamma(x)'] = math.lgamma(df['x']) # This line will raise the error: TypeError: cannot convert the series to <class 'float'>
It is not at all bizarre that replacing lgamma leads to a different result lgamma takes only single values. As you already thought, you would need a loop to create a list of the lgamma results and then append that list to the df.
Does that help?
Best regards
Steffen
*that needs a working Python environment with pandas installed
Hi @harrytuttles , your problem in this case doesn’t appear to be an understanding of the KNIME data structures as your code sample has dealt with those perfectly fine.
The problem you are facing is with using pandas dataframes, so you might want to follow up with finding documentation for pandas dataframes online.
[edit: I have just seen that @steffen_KNIME has replied as I wrote this, with much the same thoughts]
I threw the essence of your question at chatgpt, and I believe it came up with an alternative solution without any prompting:
Me: What does this do?
df['ln_gamma(x)'] = math.lgamma(x)
ChatGPT: It looks like you’re trying to calculate the natural logarithm of the gamma function for each element ‘x’ in the DataFrame ‘df’ and store the results in a new column named ‘ln_gamma(x)’. The ‘math.lgamma()’ function computes the natural logarithm of the absolute value of the gamma function.
Me: When this executes it gives an error
x = df['x'] df['ln_gamma(x)'] = math.lgamma(x)
ChatGPT: The issue might be that math.lgamma() expects a single value, but you’re providing a Pandas Series, which contains multiple values. To apply the function element-wise to the entire column ‘x’, you can use the apply() function:
df['ln_gamma(x)'] = df['x'].apply(math.lgamma)
This will apply the math.lgamma() function to each element in the ‘x’ column and store the results in the ‘ln_gamma(x)’ column.