pattern analysis

Correct, then the count is updated to include the strings added to the cluster and the clustering is repeated.

Clustering is something you’ll do when you have no list of “correct” values to match to. In your Country names example here you may want to play around with the similarity search node using a Levenshtein Distance (it’s for strings).

One other general note that I don’t think we addressed in the thread, it’s probably a good choice to lower or uppercase all your strings before getting to into the processing, a =/= A unfortunately!

Best of luck!

1 Like