I am seeking for reason why n-gram creator node is resulting into same terms occuring together (shown in figures below). Sometimes, my Decision Tree also reflect the same terms for example ‘human human’ as a result when using 2 grams. Is there any specific reason for this ?
There is nothing preventing n-grams to contain the same term twice. If the word human appears in the document twice within a document then that could happen.
Meaningful and trivial examples would be:
blah blah
knock knock
Also you need to consider punctuation erasures, removal of articles etc.