Duplication: I would like to get similarity score (in Percentage), for similar data rows

Sachin_Sunil_Pa · April 15, 2025, 2:12pm

Hey Folks, Could anybody able to help me with this?
Here is sample data row 1 and row 6, 7 are similar records, I would like to get extra column where I can see the similarity score for e.g. 80% (Similarity of data in row 1 and 6) How can this be achieved using Knime?
I heartly thankful for your attention!

Warm Regards,
Sac

rfeigel · April 15, 2025, 3:07pm

How do you define “similarity” and what are you basing it on? Can you share some sample data?

Sachin_Sunil_Pa · April 30, 2025, 12:06pm

Thank you for your response! I am looking for Fuzzy match for deduplication, it could be customer data like ‘John Smith’, ‘John Smyth’ for single customer. I want to develop a workflow which could help me to find this deduplication.
I hope this message is informative and please let me know if something is still unclear

rfeigel · April 30, 2025, 2:12pm

What you just described doesn’t match the screenshot you posted. Can you post some sample data with a clear description of what you want?

ActionAndi · April 30, 2025, 3:30pm

How about this one?

You can use the “remove duplicates node” and select the columns you want to consider (maybe some string cleaning is needed beforehand)

system · July 29, 2025, 3:31pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.