Finding duplicates compagnies

Grayfox · January 28, 2020, 2:42pm

Hi everyone,

I am working on a list of compagnies and i need to check if there is duplicates compagnies in a table.

But it’s not easy to find thoose duplicated beacause there are compagnies written in different ways.
And the list is very long, so I need to optimize the process to find duplicates.

I have tried to use the worflows of duplicated adresses but it’s not working very well.

So, if someone can help me to built the worflow to find the duplicates.

Just for example, i give you names of compagnies to show you what it’s look like.

A-2-Z Solutions
A&B
A&G
a2i
A2Z Solutions
ABSolute
Absolute Magic
AC Exchange
AC&E
ACS
ACS Shop
Active
Active Data
C3 Development
C3I
EDP
EDS
Harvey
HarveyOpolis

Thank you for the helping

HansS · January 28, 2020, 3:01pm

Hi @Grayfox

Take a look at the String Similarity node.
Knipsel

gr. Hans

Rich_ard · January 28, 2020, 7:11pm

Hi Grayfox,

I had exactly the same requirement and, after trying various options, found the best solution (although it took me a while to work out how to configure it to fit my data structure) to be this example workflow:

Index and query addresses – wiswedel

Adapting this workflow helped me to identify a bunch of duplicate organisation records, albeit with some false positives, that it would have been hard and time-consuming to identify manually. If you combine this workflow with the xls formatting nodes, you can export the results into a nicely-formatted Excel sheet that groups potential duplicates into colour-coded blocks. Unfortunately I can’t share my workflow as it contains confidential data, but here’s an edited and non-confidential extract of the output.

Hope this is of some use,

R

mlauber71 · January 29, 2020, 5:06am

In addition to the workflow mentioned by @HansS and @Rich_ard (one created by @wiswedel ) you could check out these entries about the concept of

Fingerprinting

Compare strings and adresses

system · July 29, 2020, 5:06pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.