Fuzzy Lookup/Matching/Joining...

Brining up Fuzzy Lookup/Matching/Joining again.

I’ve done extensive work with Fuzzy Match in Excel, and it is frustratingly near impossible to efficiently return the same results in KNIME.

String Matcher problems:

  • Only provides a score for the First match, not subsequent matches.
  • Scores are in distance, not Similarity.
  • Only a single field can be matched. If I want to fuzzy-match company names, street addresses, city, state, postal code, country, then each individual category should independently incur its own penalties. Currently, I have to ram all these values into a single field.
  • It would be great to have the ability to weight all of the match criteria and/or have similarity thresholds on each of the joins.

It is not that bad. Look on example here 03_Example_for_Fuzzy_Address_Matching — NodePit

Thanks for the suggestion, but that example uses Java and Index Query and has fixed column names in the code.

Yes, I could modify it for a single use case, but not every use-case I am presented with.

#lowcode

Here more links
https://www.knime.com/knime-applications/address-deduplication

2 Likes

Thanks, I like this one better! :stuck_out_tongue:

Aggregated distance could use a couple more ports. :wink: