Obtaining the class probabilities through the Naive Bayes Predictor node is easy. However, I am encountering some difficulties re-calculating those probabilities. I wonder whether this would be due to precision or the actual formula.
Suppose a nominal class prediction problem with the following document to predict:
word1 word2 word2, with both words being columns of a value of
1 for that instance in the document vector. Suppose the class could take any of the two values A and B.
Please note that in KNIME, I have applied the Number To String node to the term columns in the document vector, so that Naive Bayes Learner basically sees
"0.0" for each term - if I don’t do this, it appears that NB Learner will not calculate the counts correctly.
Given the above, I’d assume that Naive Bayes Predictor would score the instance as follows:
score(class = A) = p(class=A) x p(term = word1 | class = A) x p(term = word2 | class = A)^2 score(class = B) = p(class=B) x p(term = word1 | class = B) x p(term = word2 | class = B)^2
Would that be the correct formula for calculating the score ?
Furthermore, would the predicted class probabilities be calculated using the following formula ?
probability(predicted class = A) = score(class = A) / sum(score(class = A), score(class = B)) probability(predicted class = B) = score(class = B) / sum(score(class = A), score(class = B))