N-gram creator node resulting in exactly same terms occuring together.

Hi,

I am seeking for reason why n-gram creator node is resulting into same terms occuring together (shown in figures below). Sometimes, my Decision Tree also reflect the same terms for example ‘human human’ as a result when using 2 grams. Is there any specific reason for this ?

Any help is very much appreciated.

Thanks in advance :slight_smile:

Hi,

There is nothing preventing n-grams to contain the same term twice. If the word human appears in the document twice within a document then that could happen.

Meaningful and trivial examples would be:

  • blah blah
  • knock knock

Also you need to consider punctuation erasures, removal of articles etc.

Best,
Temesgen

3 Likes