Bag of Words includes punctuation and excluded words

#1

Hi all,
I’m new to Text Processing nodes and running into a few issues.

I was trying to do the Pre-processing before create Bag of words and then TF nodes.

  1. I had the Dictionary filter node before “Bag of words creator” nodes. However, those words that i filtered still appear in the final Term list.

  2. And could you share a way to filter the special characters like “(”. “?” “+” etc? I added in the dictionary filter but it doesnt seem to work. Thanks so much!

  • Jinny -
0 Likes

#2

Hey @JinnyLe,

did you select the correct document column while configuring the Bag Of Words Creator node? It’s possible that one of the preprocessing nodes created a new column containing the preprocessed documents and the Bag Of Words Creator is still using the original document column.

Special characters can be removed by using the Punctuation Erasure node. However, using a dictionary should work as well.

Cheers,

Julian

1 Like