Similarity Search: Execute failed: Argument contains duplicates

swbine · April 19, 2019, 4:50pm

Hi all,

I’m trying to do fuzzy matching with names in two different datasets.

I tried String Matcher as well but would like to see some other results with Similarity Search (like in the Example workflow Fuzzy String matching (https://hub.knime.com/knime/workflows/Examples/08_Other_Analytics_Types/01_Text_Processing/09_Fuzzy_String_Matching*vZLbH1jBCR6FXmhR) ) as well.

Here’s the problem: I’m getting the Error “Execute failed: Argument contains duplicates [col_name, col_name]”.
I tried to remove duplicates with groupby and the removed names that are the same in both datasets with inner join but I still get the same error.

Other ideas?
Thanks!

Sabine

mauuuuu5 · April 22, 2019, 3:03am

Hi Sabine see following workflow that finds duplicate names in the same Column, I think you can change it.

Hope it helps

Mau

Name Deduplication Forum.knwf (9.2 KB)

ipazin · April 23, 2019, 8:56am

Hi there!

The error message is a bit confusing. It is not about duplicates in data set. Actually you need to have duplicated reference column data. In the example you linked there is String Manipulation node that does exactly that.

I have done an example workflow so check it out. If any questions feel free to ask.

2019_04_23_Similarity_Search_Example.knwf (17.0 KB)

Br,
Ivan