having problem with punctuation Erasure when trying to check similarity

Hi dear team

 

It is my first experience with KNIME. In fact, I have two groups of document titles , I'd like to find the documents in groups B which are most similar to documents in group A. 

I built a workflow as below :

Excel reader -> strings to documents-> BoW creator-> Punctuation Erasure-> Stop word Filter-> N Chars Filter-> Case Convertor ->snowball stemmer-> TF-> IDF-> Math formula (tf*idf)->Rank -> row filter (for first 1153 docs selected as seed documnents-> similaritysearch->column filter-> excel writer.

However, when getting to punctuation erasure, I got the error :

execute failed: cell at index 0 is null

I got the same for other modules , too. In fact, BoW creator seems to be the last node working properly . 

As I have 52150 rows. Itried a small part of the data , and got the same error. Besides, I tried the excute funtions after omitting long strings and observe no changes. 

Would you please help me how to fix the problem ?

Best 

Hajar Sotudeh

 

 

Hi Hajar,

you need to use all preprocessing nodes before creating the bag of words. Your workflow should look like:
…->Strings to Document->Preprocessing e.g. Stop Word Filter, Stemmer, …->Bag of Words->TF->other frequency nodes->Document Vector->Distance Matrix (cosine)->Similarity search

I hope this helps.

Cheers, Kilian