I am using Python Script node to perform Linear regression using OLS.
Inside the node in configuration when I run the model I get different results than I get when I run the node and look at the “Standard Output” This has really stumped me! The data is 34k rows. Clue I run this on a smaller set of data 180 rows and the numbers match ok.
Results for coefficients when I run while configuring the node:
coefficients
const 9.387454
Variable A -0.017212
Variable B -0.003608
Variable C 0.002542
Variable D -0.001975
Variable E -0.292101
dtype: float64
Results when I view the “Standard Output”
coefficients
const 9.275137
Variable A -0.017524
Variable B -0.001701
Variable C 0.000072
Variable D 0.000005
Variable E -0.133936
dtype: float64
ts_r_csat_s
You can see that all the numbers are different.
Python Code is:
Add a constant to the independent variables for the intercept
X = sm.add_constant(X)
Fit the regression model
model = sm.OLS(y, X).fit()
Get the coefficients for each independent variable
@jimo42 this is still not an example that one would be able to follow and work with. But you said:
If you run the script when you have the Python node open it will just take a small sample to test and not the full dataset. So the correct result would come from running the full node. If you want to compare the results you may want to run the code in a Jupyter notebook maybe and see if the results are the same.
@mlauber you are the man! yes I ran it in Jupiter Notebook and can confirm that the results are as when the node is run outside of configuration. Who knew about this sample data run? although the clue was the small data set giving the same result inside and out.
I thank you for imparting your extensive knowledge to me!
well, this is how a lot of these nodes would work. When you configure them you would want a quick idea if the code would work and not wait for the whole thing to finish every time you change some lines of code.