I have a dataset that contains some text (comments) in a column. I want to check if there is any duplicate or similar content in that column. Similar or duplicate content can be determined where the meaning is similar but the structure of the sentence is different.
A sample workflow would be really helpful.

Hi @omprakashjena

In that case, a sample dataset would be helpfull. What is your input and what is the output that you expect. The logic in your post is a little bit fuzzy to me.

gr. Hans

@omprakashjena you could check this example. It has been used to bundle together similar addresses but it might also be adapted for other tasks. It does not need a ground truth but would start to combine similar strings:

You could also explore more examples about string matching:

