Hello,
I have been exploring the possibilities of PMML with Knime. I have come across a very important tag inside a PMML file to expand the understandability of the training data and the model in a PMML file.
It is the <ModelStats> tag. More information about this tag is found here.
Scenario:
This tag is used to include various information about the training data like:
- Mean, Std, Median, IQR
- Min, Max
- Frequency, Missing Freq, Invalid Freq
- Count per category, per categorical column, etc.
Other model-related stats like:
- ANOVA
- Confidence interval
- t-test value
- p-value
- degrees of freedom
- Standard error
etc…
Now, I am not able to find a way with which I can include those in the PMML file. I am able to generate the descriptive stats like Mean, std etc listed in the 1st list above, with python using ContinuousDomain
and CategoricalDomain
but still unable to get other stats like those listed in the 2nd list.
What am I asking is…
To find a way to include such stats (from 1st list and 2nd) in the PMML with KNIME. Is there any way? How to do that?
Please guide. Thank you.