Filtering. Here, all those terms that do not contain text content, such as stop words, numbers, punctuation marks, etc. are filtered from the documents. Stemming. Word affixes are removed; the word roots only are retained. Lemmatization. With lemmatization we remove only inflectional word endings returning the base or dictionary form of a word, which is known as lemma.

