Check similar or duplicate content using text analytics

omprakashjena · June 27, 2023, 2:33pm

Hi,

I have a dataset that contains some text (comments) in a column. I want to check if there is any duplicate or similar content in that column. Similar or duplicate content can be determined where the meaning is similar but the structure of the sentence is different.
A sample workflow would be really helpful.

HansS · June 27, 2023, 2:48pm

Hi @omprakashjena

In that case, a sample dataset would be helpfull. What is your input and what is the output that you expect. The logic in your post is a little bit fuzzy to me.

gr. Hans

mlauber71 · June 27, 2023, 6:25pm

@omprakashjena you could check this example. It has been used to bundle together similar addresses but it might also be adapted for other tasks. It does not need a ground truth but would start to combine similar strings:

You could also explore more examples about string matching:

system · September 25, 2023, 6:25pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.