Matching or Joining Names - International (different characterset) Support

I have a need to try to match customer names from different sources. Using many standard techniques, I can match on exact names, fuzzy name matching, distance algorithms like Levenshtein distance, phonetic techniques like Soundex and Metaphone, etc. These techniques work great when your working with a common character set. But the issue becomes more difficult when your trying to match records from different languages.

Take as an example a business record for “Prey Medical Services”. The matching Turkish business record would read “Prey Sağlık Hizmetleri“. A simple translation will not work. Proper names should not be translated. Otherwise “Prey” might become “AV”. So depending on the translation service you use, “Prey Sağlık Hizmetleri“ becomes “Prey Health Services” or “AV Health Services”, etc… And often language differences play into effect. For example a Hospital in the US might become a Polyclinic in Europe.

Can someone point me towards some research materials that talk about how you manage identity management and name matching when names in different languages. KNIME is an international community so I can’t believe Im the first person that wanted to do this.

Hi ScottMcLeodPSLGroup,

Try to have a look at some corresponding scientific papers:



https://www.researchgate.net/publication/221579560_Turkish_-_English_cross_language_information_retrieval_using_LSI

Best,
Anna