Random Forest classification model to PMML Error

Hi,
I’m using Random Forest model Node for classification problem .
I want to create a PMML file for deployment.
So I used 2 nodes more and moved the model to
"Tree Ensemble Model Extraction" ->“Table to PMML Ensembel”** -> To PMML Node

I have two problems:

  1. can’t find the probability to “YES” - The output create only the classify group - if its ‘YES’ or ‘NO’ group . (cant use for Lift chart or ROC curve or cuttoff according the probability)
  2. The performance is very low compare to the Random Forest Predictor Node.

Can help ?
kind regards ,
Einav

Hello @Einavtam,

regarding your questions:

  1. This seems to be a bug in the PMML Ensemble Predictor since the PMML standard states that the probabilities in case of the multiple model method “Majority” should be calculated according to the distribution of votes. Note that this means that the probabilities should be available if you export the model to a different platform.
  2. I guess by the performance you mean runtime, right? This is to be expected because a PMML document is just an XML file which becomes huge for random forests because a random forest consists of a collection of large decision trees. Our internal format stores these trees in a much more efficient way and can thus also predict much faster.

In any case, only convert to PMML if you really have to (i.e. want to export the model to another platform) since our internal models provide more functionality and usually faster (unless we also store them as PMML internally e.g. regression and decision tree models).
I am not a fan of random forests in combination with PMML for the reason outlined above and if there is no specific need for a random forest I would recommend you to also try out Gradient Boosted Trees because they produce much smaller trees and thus result in more manageable PMML documents (they also provide the same functionality i.e. probabilities for our internal and the PMML format).

Kind regards,

Adrian

1 Like