Hi, I would like to find similar URLs within a column and mark them as duplicates (e.g. by creating a new column with “Duplicate”). I have tried string similarity, but here two columns are compared. I would like to have the check within one column.
A classic example would be two URLs once with / and without / at the end. Or the URL path was minimally adjusted with an additional pronoun and a duplicate was created.
@qqilihq thanks for the hint. I already tried something like this, but what would you do next (I tried Similarity Search, but then I can’t filter them out of the list…).
Hi, I have downloaded the workflow, but unfortunately I still don’t understand how to apply it to my case… I have a list with over 1000 URLs and sometimes up to 3 or 4 similar URLs that I would like to filter out or possibly merge… I don’t quite understand which part of your workflow can help me.
@Juliane it allows for a deduplication without a ground truth so it will group similar items within one list. I thought this might be similar to your case.