Name Comparison and Matching

andregol · February 5, 2020, 2:25pm

Hi everyone,

I’ve been searching, but haven’t found a solution to what I need (I apologize if there is an answer already).
I’ve read about fuzzy matching, string similarity, the Levenshtein algorithm, but I don’t think they apply to my problem.

I have two tables with names, and I need to compare and find the matches between them.
The problem is that the names are not exactly the same between the databases: from one I can get the full name and surname, while the other has only the first letter, or maybe a middle name included, or vice-versa.

I join the tables from a common column (“Room” in the example workflow attached), and I need to filter the rows to give me only the results with the single name match, and the “Fare” columns from both databases, so I can compare the differences.
Also, in my task, both tables have millions of rows, so it would need to be a rather efficient method to do this.

Is there a way to do this in Knime?

I appreciate any help and I thank you in advance!

Name_Match.knwf (18.2 KB)

AlexanderFillbrunn · February 5, 2020, 3:10pm

Hi,
I don’t understand what exactly the surname and what the first name is in your example. Is that also mixed? If not, I think you need to bring the names to the “lowest common denominator”, for example by always taking only the first letter of the first name and remove any middle names. Then change everything to upper or lower case and you can compare. Does that help?
Kind regards
Alexander

andregol · February 10, 2020, 11:37am

Hi Alexander,

Thanks for the quick response and sorry for taking this long to reply.

The name pattern is like this: “Surname / First Name”.
If there is a middle name, it sometimes goes with the surname, sometimes with the first name. There is no rule for this.

I have tried joining by matching the first letter of the names. But the problem is that, with many names, there will be people with both names starting with the same letters, generating duplicates.

I have removed any special characters and left all the names in upper case.

I guess I would need a solution that breaks down the names and compares if the string from one table is contained in the other, but not necessarily with an exact match (since there might be middle names or only part of the name in the mix).

izaychik63 · February 10, 2020, 1:31pm

@andregol, after you did recommended join and got so called duplicates, use of

may help to filter only very closely spelled names.

andregol · February 17, 2020, 2:32pm

@izaychik63, thanks for the suggestion, it really helped me!

I broke the strings down and compared separately the distances for the last names and then for the first names only (removing any middle names there might be).
This helped me find the best matches.

Thanks for the help!
Andre

system · August 18, 2020, 2:32am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.