Duplication: I would like to get similarity score (in Percentage), for similar data rows

Hey Folks, Could anybody able to help me with this?
Here is sample data row 1 and row 6, 7 are similar records, I would like to get extra column where I can see the similarity score for e.g. 80% (Similarity of data in row 1 and 6) How can this be achieved using Knime?
I heartly thankful for your attention!

Warm Regards,
Sac

How do you define “similarity” and what are you basing it on? Can you share some sample data?

Thank you for your response! I am looking for Fuzzy match for deduplication, it could be customer data like ‘John Smith’, ‘John Smyth’ for single customer. I want to develop a workflow which could help me to find this deduplication.
I hope this message is informative and please let me know if something is still unclear

What you just described doesn’t match the screenshot you posted. Can you post some sample data with a clear description of what you want?

1 Like

How about this one?

  • You can use the “remove duplicates node” and select the columns you want to consider (maybe some string cleaning is needed beforehand)

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.