Extracting the "OutputFields" from a RegressionModel PMML file

awp233 · December 12, 2019, 1:44pm

I work for a software company that make scorecards for credit risk using logistic regression models.

We have a new feature that allows a scorecard to be exported in PMML format but i am struggling to use the PMML file to ‘score up’ a dataset - i was wondering whether anyone can help me?

Below is an example of the PMML file that our software produces (the example is for a logistic regression model with 1 characteristic in it called “BANK_TERM”:

<?xml version="1.0"?><PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header copyright="Paragon Business Solutions 2016" description="Paragon Modeller Scorecard">
	<Application name="Paragon Modeller" version="1.7.0.1"/>
</Header>
<DataDictionary numberOfFields="2">
	<DataField name="GOOD" dataType="double" optype="continuous"/>
	<DataField name="BANK_TERM" dataType="double" optype="continuous"/>
</DataDictionary>
<TransformationDictionary>
	<DerivedField name="BANK_TERM_Grouped" dataType="string" optype="categorical">
		<Discretize field="BANK_TERM" mapMissingTo="UnGrouped">
			<DiscretizeBin binValue="UnGrouped">
				<Interval closure="openOpen" rightMargin="0"/>
			</DiscretizeBin>
			<DiscretizeBin binValue="0">
				<Interval closure="closedOpen" leftMargin="0" rightMargin="1"/>
			</DiscretizeBin>
			<DiscretizeBin binValue="1-100">
				<Interval closure="closedOpen" leftMargin="1" rightMargin="101"/>
			</DiscretizeBin>
			<DiscretizeBin binValue="101-200">
				<Interval closure="closedOpen" leftMargin="101" rightMargin="201"/>
			</DiscretizeBin>
			<DiscretizeBin binValue="201-311">
				<Interval closure="closedOpen" leftMargin="201" rightMargin="400"/>
			</DiscretizeBin>
			<DiscretizeBin binValue="400-500">
				<Interval closure="closedOpen" leftMargin="400" rightMargin="501"/>
			</DiscretizeBin>
			<DiscretizeBin binValue="501-4500">
				<Interval closure="closedOpen" leftMargin="501"/>
			</DiscretizeBin>
		</Discretize>
	</DerivedField>
</TransformationDictionary>
<RegressionModel modelName="Logistic WoE Model 12 (12/12/2019 10:00:18)" functionName="regression" algorithmName="stepwise least squares" targetFieldName="GOOD" normalizationMethod="logit">
	<MiningSchema>
		<MiningField name="GOOD" usageType="target" outliers="asIs" missingValueTreatment="asIs" invalidValueTreatment="asIs"/>
		<MiningField name="BANK_TERM" outliers="asIs" missingValueTreatment="asIs" invalidValueTreatment="asIs"/>
	</MiningSchema>
	<Output>
		<OutputField name="RawScore" optype="continuous" dataType="double" feature="predictedValue" targetField="GOOD"/>
		<OutputField name="Score" optype="continuous" dataType="double" feature="predictedDisplayValue" targetField="GOOD">
			<NormContinuous field="RawScore">
				<LinearNorm orig="0" norm="100"/>
				<LinearNorm orig="1" norm="157.7078016355585"/>
			</NormContinuous>
		</OutputField>
	</Output>
	<RegressionTable intercept="0">
		<CategoricalPredictor name="BANK_TERM_Grouped" value="UnGrouped" coefficient="0"/>
		<CategoricalPredictor name="BANK_TERM_Grouped" value="0" coefficient="-0.455779235502154"/>
		<CategoricalPredictor name="BANK_TERM_Grouped" value="1-100" coefficient="-0.000006210746542"/>
		<CategoricalPredictor name="BANK_TERM_Grouped" value="101-200" coefficient="0.099084236747815"/>
		<CategoricalPredictor name="BANK_TERM_Grouped" value="201-311" coefficient="0.775581076537226"/>
		<CategoricalPredictor name="BANK_TERM_Grouped" value="400-500" coefficient="0.72822894463888"/>
		<CategoricalPredictor name="BANK_TERM_Grouped" value="501-4500" coefficient="1.731641380474746"/>
	</RegressionTable>
</RegressionModel>

When setting this as a “JPMML Regression Predictor” in KNIME, i am able to extract the Predicted Probabilty (Probabilty (good)) using an Interactive table but have no idea how to extract the two “outputfields” from the file (RawScore and Score)

Can anyone help with this?

AlexanderFillbrunn · December 13, 2019, 8:14am

Hi @awp233,
I am afraid those fields are currently not returned by any of our predictor nodes. The only option I see is to extract the OutputField definition from the PMML using an XPath node and doing the transformation on RawScore yourself. If you need help with that, let me know!
Kind regards
Alexander

AlexanderFillbrunn · December 13, 2019, 9:08am

Hi,
I built a workflow that extracts the norm and orig fields and applies the transformation to the JPMML Regression Predictor’s output. Please find it attached and let me know if you have any questions.
Kind regards
Alexander

OutputFields.knwf (23.6 KB)

awp233 · December 13, 2019, 10:08am

Hi Alex, thanks so much for your help.

I have checked your KNIME workflow and the transformed score does not seem to be correct. I’ll give a bit more information about what the “score” and “rawscore” fields actually are. This is how they SHOULD be calculated - our PMML output may not be coded correctly so please let me know if you have any suggestions.

Basically - the “rawscore” field is simply calculated by summing up the “coefficient” values for each field in the scorecard (in our case, there is only one field BANK_TERM).

Next, the scaled score is calculated using the following equation:

Score = LinearNorm0 + (RawScore*(LinearNorm1-LinearNorm0))

So, using my scorecard as an example the score would be calculated by:

Score = 100 + (RawScore*(157.7078-100)
Score = 100 + (RawScore*57.7078)

Are you able to set up a workflow that does that?

Thanks again for all your help

AlexanderFillbrunn · December 13, 2019, 10:17am

Hi @awp233,
I used the formula given in the PMML documentation for the calculation. RawScore is what comes out of the JPMML Predictor, right? So for the score you just need to adjust the formula in the Math Formula node in the workflow to do the calculation you need.
Kind regards
Alexander

system · June 12, 2020, 10:17pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.