Finding similar names

Dear all,

I have a large table with >10000 names/spellings in one column containing names which differ slightly for different persons for example:

 

Mickey Mouse

M. Mouse

Mouse Mickey

M. Disney Mouse

M. Dinsey Muose

 

I would like to find somehow groups of similar names in order to construct with minimal manual effort a dictionary replacer replacing finally all of the above names by Mickey Disney Mouse, for instance.

Any idea how to this? 

Thanks Jerry

1 Like

Hi Jerry,

you could use the Indexing and Searching plugin. Have a look at the Fuzzy Address Matching example which identifies typos and different names in addresses using fuzzy matches.

Bye,

Tobias

Hello Tobias.

that was the right way to go. It took me a while to figure the workflow out. The key was how to construct the  query (I am not so into Java) and choose the right sensitivity for the fuzzy search. 

Thanks a lot

Jerry

Hi @tobias.koetter,

I wanted to check the links you posted but unfortunately I am not able to access it seems the page is private? Could you share the solution please I have tried many different things in last weeks but haven’t had a good result yet

1 Like

Hello @B074534 ,
I have updated the links in my original post. So now they should work again. Sorry for that.
Bye
Tobias

3 Likes

Thank you, I am going through the workflow to understand it…

1 Like

Jerry, (I am also not into Java and new to Knime) Do you have the workflow that you used for your ‘name consolidation’ case? I am struggling to produce something quick :S

Look also here

3 Likes

@B074534 in addition to the workflow that @izaychik63 suggested I have complied a small collection about address deduplication and string similarity:

2 Likes

Sample Workflow.knwf (31.2 KB)

Here is a short workflow example. You might have to adjust the sensitivity value inside the meta node. Hope this helps.

Regards Jerry

1 Like