Python Script Result difference

I am using Python Script node to perform Linear regression using OLS.
Inside the node in configuration when I run the model I get different results than I get when I run the node and look at the “Standard Output” This has really stumped me! The data is 34k rows. Clue I run this on a smaller set of data 180 rows and the numbers match ok.
Results for coefficients when I run while configuring the node:
coefficients
const 9.387454
Variable A -0.017212
Variable B -0.003608
Variable C 0.002542
Variable D -0.001975
Variable E -0.292101
dtype: float64

Results when I view the “Standard Output”
coefficients

const 9.275137
Variable A -0.017524
Variable B -0.001701
Variable C 0.000072
Variable D 0.000005
Variable E -0.133936
dtype: float64
ts_r_csat_s

You can see that all the numbers are different.
Python Code is:

Add a constant to the independent variables for the intercept

X = sm.add_constant(X)

Fit the regression model

model = sm.OLS(y, X).fit()

Get the coefficients for each independent variable

coefficients = model.params
print(“coefficients\n\r” ,coefficients )

Can anyone explain this and offer a fix?

Thanks in advance - really need to solve this!

@jimo42 is there any chance you provide us with a sample workflow that would demonstrate the effect - or post the code in a way that it is readable?

import pandas as pd

Yes here is the code:

import pandas as pd
import statsmodels.api as sm
#import numpy as np

Load your data into a pandas DataFrame

df =input_table_1

Define the independent variables and dependent variable

X = df[[‘Variable A’,
‘Variable B’, ‘Variable C’,
‘Variable D’, ‘Variable E’]]
y = df[‘Variable F’]

Add a constant to the independent variables for the intercept

X = sm.add_constant(X)

Fit the regression model

model = sm.OLS(y, X).fit()

Get the coefficients for each independent variable

coefficients = model.params
print(“coefficients\n\r” ,coefficients )

@jimo42 this is still not an example that one would be able to follow and work with. But you said:

If you run the script when you have the Python node open it will just take a small sample to test and not the full dataset. So the correct result would come from running the full node. If you want to compare the results you may want to run the code in a Jupyter notebook maybe and see if the results are the same.

1 Like

@mlauber you are the man! yes I ran it in Jupiter Notebook and can confirm that the results are as when the node is run outside of configuration. Who knew about this sample data run? although the clue was the small data set giving the same result inside and out.

I thank you for imparting your extensive knowledge to me!

1 Like

well, this is how a lot of these nodes would work. When you configure them you would want a quick idea if the code would work and not wait for the whole thing to finish every time you change some lines of code.

Glad we figured this out :slight_smile:

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.