๐—œ๐—ป๐˜๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐—ฒ๐˜…๐—ผ๐—ฟ๐—ฏ๐˜†๐˜๐—ฒ ๐— |๐—•๐—ข๐—ซ ๐—ฃ๐—ฎ๐—ฟ๐˜๐—ป๐—ฒ๐—ฟ ๐—˜๐˜…๐˜๐—ฒ๐—ป๐˜€๐—ถ๐—ผ๐—ป ๐—ณ๐—ผ๐—ฟ ๐—ž๐—ก๐—œ๐— ๐—˜

Approximate Matching & Index-based Retrieval in KNIME (exorbyte Nodes)

Exact joins are a bottleneck in many KNIME workflows.

As soon as data becomes even slightly inconsistent (typos, different encodings, missing fields), classical joins and rule-based pipelines start to break down.

Weโ€™ve released a set of KNIME nodes that address exactly this layer:

:backhand_index_pointing_right: exorbyte matchmaker toolbox (M|BOX)

What it actually does (technical view):

| Builds an in-memory index over structured or semi-structured data
| Executes approximate queries against the entire index
| Supports multi-attribute matching with configurable weighting

Returns:

| best match
| similarity score
| optional alignment information

๐—ก๐—ผ ๐—ฏ๐—น๐—ผ๐—ฐ๐—ธ๐—ถ๐—ป๐—ด, ๐—ป๐—ผ ๐—ฐ๐—ฎ๐—ป๐—ฑ๐—ถ๐—ฑ๐—ฎ๐˜๐—ฒ ๐—ฝ๐—ฟ๐—ฒ๐˜€๐—ฒ๐—น๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป, ๐—ป๐—ผ ๐—ฝ๐—ฟ๐˜‚๐—ป๐—ถ๐—ป๐—ด.

Core nodes:

Table Indexer
โ†’ builds a multi-field index (e.g. name, address, id fragments)
Table Index Matcher
โ†’ queries the index with fuzzy logic across all fields
Approximate String Matcher
โ†’ pairwise similarity (Levenshtein, LCS, positional methods)
Character Mapper
โ†’ normalization layer (diacritics, variants, encoding issues)

Whatโ€™s different compared to typical KNIME approaches:

Not a join โ†’ index-based retrieval problem
Not ML โ†’ deterministic, explainable scoring
Not preprocessing-heavy โ†’ works on dirty data directly

Where this becomes relevant:

Identity resolution across heterogeneous sources
KYC / sanctions screening pipelines
OCR / ICR post-processing (error-tolerant lookup)
Product or entity matching without stable identifiers

Typical workflow pattern:

Normalize input (optional)
Build index once
Query repeatedly
Post-process matches (thresholding, routing, enrichment)

If youโ€™re working on anything where:

joins are failing
rules are exploding
or preprocessing becomes the main workload

this might be a useful addition to your KNIME setup.

:backhand_index_pointing_right: Extension + example workflows:
https://lnkd.in/eqk8giDa

3 Likes

you may want to

  • fix your font
  • fix your list icons (|)
  • explain the advantage of your paid notes compared to a using String Similarity, String Cleaner and Value Lookup directly. the only thing I see straight away is that you enable processing of more than 1 column(-pair) which should be not much more than a loop and assigned weights per column(-pair).

Canโ€™t drag and crop the extension. Also how is this different from the extension you previously posted (which did install)?