Hi,
This ticket is about requesting Laplace Smoothing as an additional option in the Naive Bayes Learner node, with a pseudocount alpha
as a parameter. Concerning the implementation, I do not have a preference between adjusting the counts table as calculated by the NB Learner node and handling the actual adjustment in the NB Predictor node.
The Naive Bayes Learner offers the parameter default probability
threshold to handle zero probability situations. Instead of 0
, the learner, or rather the predictor, then applies the default probability threshold. The zero probability situation is thus dealt with after the probability calculation.
For categorical data, Laplace Smoothing handles the zero probability situation during the probability calculation, i.e. by adding a pseudocount both in the numerator and denominator (in the latter this pseudocount is multiplied by the number of target class values). This way, a probability cannot become zero.
When to use Laplace Smoothing over default probability when the data are discrete ?
-
Laplace Smoothing adjusts both overly optimistic (probability of 100%) and overly pessimistic (probability of 0%) situations, while default probability only addresses the zero probability situation.
-
The impact of the correction applied by Laplace Smoothing is asymptotic and therefore vanishes with an increasing number of instances. This property may have a slight intuitive advantage in text mining, an application for which it could be relatively more difficult to define a sufficiently low default probability when faced with very low counts. Under the condition of a high enough internal precision of the probability calculation, the asymptotic property still preserves the benefit of dealing with the non-0 and non-1 probability situations, even in case of a very large number of instances.
-
When the overall number of instances is low, chances are that the underlying probability calculations tend to be biased. LS allows to address this situation as well. Obviously, having more data (a higher number instances) in a first place rather than having to adjust the probability calculation afterwards is statistically more sound, but that is another discussion.