How to create a model to identify the Closest string values in Knime while comparing to a set of Real values?

gokulpnair · July 1, 2024, 3:34pm

Hi,
I have a list of Real Names of countries as dictionary values and another sheet have Names of countries which is having spelling mistakes/jumbled string values. The solution i am looking is to have a model which is identifying the closest Name of the countries from the list of Real values (dictionary) while looking to the jumbled / spelling mistake items of country names and to create a new column which is suggesting the closest country name. This will help me to identify and replace the incorrect spelling values with the Real values. Please suggest.

Thanks in advance.

mlauber71 · July 1, 2024, 5:57pm

@gokulpnair maybe you can adapt this example

gokulpnair · July 2, 2024, 3:44am

Thanks @mlauber71
I tried similarly search and it worked good above 85% similarity level. Since there are unmapped items, any other training model can be useful in this scenario? Thanks

mlauber71 · July 2, 2024, 5:47am

@gokulpnair could you just assign the rest manually since there is a finite number of countries. Maybe also store the results and build a workflow that would report unknown versions of writing.

Also you could experiment with several settings of the similarity search.

You could also try to employ a LLM and a vector store. Here you would have to experiment with prompts and how to best extract the values.

system · July 9, 2024, 5:48am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.