How to create a model to identify the Closest string values in Knime while comparing to a set of Real values?

Hi,
I have a list of Real Names of countries as dictionary values and another sheet have Names of countries which is having spelling mistakes/jumbled string values. The solution i am looking is to have a model which is identifying the closest Name of the countries from the list of Real values (dictionary) while looking to the jumbled / spelling mistake items of country names and to create a new column which is suggesting the closest country name. This will help me to identify and replace the incorrect spelling values with the Real values. Please suggest.

Thanks in advance.

@gokulpnair maybe you can adapt this example

2 Likes

Thanks @mlauber71
I tried similarly search and it worked good above 85% similarity level. Since there are unmapped items, any other training model can be useful in this scenario? Thanks

@gokulpnair could you just assign the rest manually since there is a finite number of countries. Maybe also store the results and build a workflow that would report unknown versions of writing.

Also you could experiment with several settings of the similarity search.

You could also try to employ a LLM and a vector store. Here you would have to experiment with prompts and how to best extract the values.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.