JanSKB
November 4, 2019, 10:52am
1
Hallo zusammen,
ich würde gerne mithilfe von Knime zwei Spalten einer Tabelle abgleichen.
Und zwar sollen nur die Zeilen angezeigt werden bei denen Auftraggebername und Empfängername identisch bzw. fast gleich sind.
Beispiel:
Hans Müller + Hans Müller -> anzeigen
Hans und Erika Müller + Hans Müller -> anzeigen
Hans Mueler + Hans Müller -> anzeigen
Max Mustermann + Hans Müller -> nicht anzeigen
Vorab vielen Dank für Eure Hilfe.
Viele Grüße
Jan Nickel
This sounds like a job for adress deduplication or ‘fingerprinting’
Concerning adress deduplication @wiswedel provided some very useful workflows
wiswedel
Joined February 26, 2018
Hi,
I have a question on how I can solve this using KNIME.
Document name
Campaign Name
APAC_SP-BusTran_EMM_16Q3_SD-WAN_Wave 1_AW_TY_ASEAN_Product
APAC_SP-BusTran_EMM_16Q3_SD-WAN_Wave 1_AW_EN_ASEAN
EMEA_ECT-SmrtCldInt_EVT_17Q4_SummitParis-Invite07.11
EMEA_ECT-SmrtCldInt_EVT_17Q4_Self-Driving-Network-Summits
EMEA_DC-Cross_EVT_16Q4_Hilversum-Attendee-Reminder
EMEA_DC_EVT_2016_Summit-NLv2
Q316 EMEA Tech Summit 12-14.07 - Lady’s cocktail party
2016-EMEA-Events-Partner-E10v2
2017Q3 O…
Fingerprinting using adresses (ignore the title)
The idea of fingerprinting in data is that if you do not have a unique ID you create your own from various features of your data.
Classic example would be you have customers but no customer ID. You would take
name, surname
adress
area code
phone number
…
and combine them. You could do several variations like only using the first 5 characters of a name and ‘clean’ it by removing special characters or even using a phonetic extraction to counter various styles of writing. Or from an address you…
Compare string similarities (you may have to set a threshold)
I built a workflow with two approaches. One using @ScottF example and putting that into a loop, using BitVectors.
And one using the
This approach introduces an artificial artificial id (art_id) and joins every string with every other one and then calculates which string is the closest match.
[image]
[image]
There are several possibilities how to calculate these similarities. You might want to read about them and decide which one is best for your task.
[image]
kn_example_similarity.k…
You would need Palladian for that:
Repository to install Palladian
http://download.nodepit.com/palladian/4.0
1 Like
ipazin
November 4, 2019, 12:19pm
3
Hi there @JanSKB ,
welcome to KNIME Community!
In general topics are posted on English but if that presents problem to you German will work as well. As you can see
Br,
Ivan
1 Like
JanSKB
November 4, 2019, 3:31pm
4
Hi there
thanks for your help.
I found the node “string similiarity”. It is perfect for my issue
2 Likes
system
Closed
May 5, 2020, 3:31am
5
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.