I was checking the probabilities coming out of the Logistic Regression Predictor and am not matching the probabilities there with my own manual calculation of the probabilities.
The logistic regression model was created in KNIME and saved to PMML. That PMML code was loaded into KNIME and connected to the Logistic Regression Predictor. Data (just the inputs) flowed into that same predictor.
The mean model score (prob) output from the Predictor is 0.1634935742703903.
My mean calculated prob is 0.20210305482423618.
the mean absolute difference is 0.038609480553845395
This is really puzzling. Has anyone encountered a problem like this? The target variable has 3 levels but I'm only computing the probability for one of them. As you can see, the numbers are close but not close enough to be mere roundoff error (unless the 16 digits of precision in the PMML code isn't enough)
Details of my calculation (note: for the numbers above, I actually only used records where VAR2 through VAR10 are 0, so only the constant and VAR1 are contributing to the sum in the computations above)
sum =
-1.74578494870595* + |
3.09082475949971E-05*VAR1 + |
0.548481855888782*VAR2 + |
-0.759391199922522*VAR3 + |
-1.9802495148082*VAR4+ |
-1.31116296357029*VAR5 + |
-0.621782784131766*VAR6 + |
1.2800395070313*VAR7 + |
-0.499764624175758*VAR8 + |
0.107928096379516*VAR9 + |
0.0969355450835457*VAR10
|