@jmanuelml21 here is an example of how you could identify duplicates and bring them together without having a ‘ground truth’. The workflow will try to determine the groups for itself.
@DAmei I had another look and I built a slightly over engineered workflow to try and get what you want combining some of the approached already mentioned. In your case there might not be a ground truth against wich to match but rather a lot of possible combinations.
So configuration will be necessary and maybe exclude some obvious detaches in the first place. What the workflow does:
the address is turned into a standardised string. The composition of the string will influence the matching. So…
@DAmei you could use the slider to set a value between 0 and 100 - the lower the more strict the definition of a duplicate gets. In your example data there is a seeming identical value while the name is very long in the field. You will have to experiment with the settings.
[image]
And also you might think about limiting the number of characters you would use to construct the company name. If you want to stress the importance of a component you could use that a few times, to give it more weigh…
Then you can do more address matching and deduplication with these examples:
1 Like