Hi,
I have a list of Real Names of countries as dictionary values and another sheet have Names of countries which is having spelling mistakes/jumbled string values. The solution i am looking is to have a model which is identifying the closest Name of the countries from the list of Real values (dictionary) while looking to the jumbled / spelling mistake items of country names and to create a new column which is suggesting the closest country name. This will help me to identify and replace the incorrect spelling values with the Real values. Please suggest.
Thanks @mlauber71
I tried similarly search and it worked good above 85% similarity level. Since there are unmapped items, any other training model can be useful in this scenario? Thanks
@gokulpnair could you just assign the rest manually since there is a finite number of countries. Maybe also store the results and build a workflow that would report unknown versions of writing.
Also you could experiment with several settings of the similarity search.
You could also try to employ a LLM and a vector store. Here you would have to experiment with prompts and how to best extract the values.