I have attached some example data and the expected outcome (second worksheet).
I want to compare the string values in one column to see whether there’s similarities among them. Please note my original file has thousands of lines and a multitude or descriptions that I’m not really aware of.
My goal is a first attempt to categorize/group similar lines together for me to further analyze the content of the file.
What would be the best way for handle this?
I already tried the STRING MATCHER and SIMILARITY RESEARCH whereby the source and comparing column are identical. But what happens is that it’s only picking up the exact same values. I want to check on similar things and not exact matches. Recharge examples.xlsx (10.6 KB)
I tried that one, but I don’t know how this would help. I only have one column with data and I want to have that grouped somehow. The String similarity is comparing two columns, which I don’t have
Hi @robvp
You could take @izaychik63 idea and send the same data into string similarity and increase the neighbor count.
Then you get more then just the same as matching
If you feed the same data twice you would always get 100% similarity for the same record. If you take more neighbors into account then you could filter out the 100% and take the second one
br