Identifying Duplicates

Hello everyone,

I’m new knime and built a workflow to identify duplicates and incorporate fuzzy match. Since I’ll be running this flow frequently is there a node or way where let’s say for fuzzy match John Doe vs Johnn Doe shows similarities but I have identified them as different people? Is there a way I can keep a record of this and when we rerun the flow it won’t identify the names mentioned above? As a result, I don’t have to re-research them again. The same goes for duplicates if I have I identify them to not be duplicated and re-run the flow I don’t want them showing me it again.

Thanks for the help

Yes it is possible. But as your mentioned, you need a place to store exception. There is no standard mechanism for this. As an idea take a look on

node. You can apply it after fuzzy join to skip extra matches.

2 Likes

Hey got a question for the group by node.
group by niode

I was wondering “Alan Fournier” shows there’s a duplicate for him but shows one link is it possible to display “Alan Fournier” twice on the list

@tyler123 you would have to include the URL as and additional reference.

In case you are interested I have a collection about address deduplication:

1 Like