fuzzy Join two tables with GroupBy/Reference

Ok, so the Similarity Search should be much faster (while using less CPU and RAM). There’s one difference in the final output though: The node looks for the nearest neighbor. If there’re multiple measurements matching the same reference value, only one of them will be in the final output. Since I don’t know how you’d want to map those “duplicates” onto the reference table, I figured that’s ok. See screenshot to see what I mean.

How it works:

  1. find nearest neighbor for each reference value
  2. join the nearest measurement value onto the reference table (SimSearch doesn’t do it for us)
  3. calculate the distance
  4. branch off and filter down to A0 values; filter them using the distance
  5. use that branch to filter by reference. keep only the groups not filtered out in the previous step
  6. remove false matches using the Rule Engine



fuzzy Join two tables with GroupBy and Reference.knwf (111.3 KB)

3 Likes