Rule Based Filter Question

mlauber71 · February 9, 2019, 1:57pm

The idea of fingerprinting in data is that if you do not have a unique ID you create your own from various features of your data.

Classic example would be you have customers but no customer ID. You would take

name, surname
adress
area code
phone number
…

and combine them. You could do several variations like only using the first 5 characters of a name and ‘clean’ it by removing special characters or even using a phonetic extraction to counter various styles of writing. Or from an address you extract the numbers and remove special extensions. For example you have (a german address)

Hauptsraße 11a
Hauptstr. 11A
Haupt Strasse 11
Hauptstrasse 11/AB
Haupt-strasse 11 - á

=> you could shorten that to:
haupt 11

since this would be the ‘major’ component of the entry distinguishing it from everything else. Next component the area code

50331
50330
5033

=> just use 503 since you would expect most people to get the first part right. The same with dates of birth

1990-11-07
1990-11-01
Nov 1990 7th

=> maybe just use 1990-11 without the date or just the month if this would be bring enough distinction

haupt 11-503-1990-11

might be pretty unique (depending on your data) and you have a good chance of capturing typos and various spellings. If your data is relatively clean you could just throw several items together like in the example and use them. You have to do a few tests und decide which approach you like.

Actually there are several blog entries in KNIME about this and I also created a small workflow to compare strings. KNIME provides you with several nodes to help you with this task.

(the title is misleading since the question about addresses was added to another topic but the links there should work)