Preprocessing for Naive Bayes


I am new to KNIME & Data Mining and would like to use the Naive Bayes Learner & Predictor for a classification problem.

So far, all I have done regarding the preprocessing of the data was handling missing values (though there are only few in the data) and applying the correlation filter with a linear correlation model.

What other steps have to be done to reach a higher accuracy of the trained model?


Hi @cynthi -

There are a couple of things you could try in addition to fixing missing values and removing correlated features, as you described. Whether or not they measurably improve your model accuracy is dependent on your data.

  1. Transforming continuous variable into discrete bins (in KNIME, use the Auto-Binner or Numeric Binner nodes)
  2. Normalization of continuous variables to both better approximate a Gaussian distribution, and convert these variable to the same scale relative to one another (in KNIME, use the Normalizer node)

Parameter optimization is probably not much help in the case of Naive Bayes.

In any case, remember that although your classification results can be useful, you probably shouldn’t put much faith in probability estimates generated using Naive Bayes.


This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.