Identifying Duplicates

tyler123 · November 24, 2022, 2:41pm

Hello everyone,

I’m new knime and built a workflow to identify duplicates and incorporate fuzzy match. Since I’ll be running this flow frequently is there a node or way where let’s say for fuzzy match John Doe vs Johnn Doe shows similarities but I have identified them as different people? Is there a way I can keep a record of this and when we rerun the flow it won’t identify the names mentioned above? As a result, I don’t have to re-research them again. The same goes for duplicates if I have I identify them to not be duplicated and re-run the flow I don’t want them showing me it again.

Thanks for the help

izaychik63 · November 24, 2022, 3:01pm

Yes it is possible. But as your mentioned, you need a place to store exception. There is no standard mechanism for this. As an idea take a look on

node. You can apply it after fuzzy join to skip extra matches.

tyler123 · November 24, 2022, 4:03pm

Hey got a question for the group by node.
group by niode

I was wondering “Alan Fournier” shows there’s a duplicate for him but shows one link is it possible to display “Alan Fournier” twice on the list

mlauber71 · November 24, 2022, 4:37pm

@tyler123 you would have to include the URL as and additional reference.

In case you are interested I have a collection about address deduplication:

system · February 22, 2023, 4:38pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.