Hello everyone, I have a dataset which is rather unclean and there are various people who have different rows with different IDs and differently-spelt names but who are actually the same person! Is there a way Knime can investigate these and flag those who are potentially duplicates? For example, there may be someone called “Julie Smith”, “JuIie Smith” (capital “i”, not an L!) or “Julie Jane Smith” in the same dataset. I have tried a short forename (3 char) surname merged field to account for longer and middle names, but that still includes some with swapped-out letters. I feel like Knime probably has some data cleaning tools to help with this, but I am not familiar with them. Can anyone point me in the right direction? Thanks in advance!

Try to download it manually and then open it in KNIME. [immagine]

Flag similar rows

mlauber71 July 20, 2022, 8:49pm 12

@JWebb here is an approach how to group addresses that are similar. Some additional aspects are being discussed

2 Likes