Distance calculation

Hi all,

How can I handle spelling mistakes in columns?
I have a column that contains country names. The column contains errors because it is typed manually.
Since I have to check all countries individually for my task whether they are high-risk countries or not, I have to make sure that all countries are in the correct spelling.
For example, appear in the column North Korea and should be checked in the next step with a list whether it is a high-risk country. However, if North Korea is misspelled, i.e. north koree instead of north korea, then it cannot be identified as a high-risk country because there is a spelling mistake. Is there a kind of distance calculation that e.g. assigns North koree, nortt korea, northh korea etc. to north korea?

I need the correct allocation not only for the check whether high-risk country, but also in connection with a further task, with which it concerns to look, into which countries our customer sends its commodity, thus a kind of network analysis fills. And if, for example, we have 10 records where goods are sent to Germany, then 10 records should come out as well. But if there is a spelling mistake, the result would be distorted because Germany was written 8 times right and two times wrong, e.g. Germnay and Gerrany.

kind regards,
Canan

Look at KNIME example
https://www.knime.com/knime-applications/address-deduplication
and
https://www.knime.com/blog/address-deduplication

2 Likes