fuzzy Join two tables with GroupBy/Reference

Thyme · February 14, 2022, 10:08am

Ok, so the Similarity Search should be much faster (while using less CPU and RAM). There’s one difference in the final output though: The node looks for the nearest neighbor. If there’re multiple measurements matching the same reference value, only one of them will be in the final output. Since I don’t know how you’d want to map those “duplicates” onto the reference table, I figured that’s ok. See screenshot to see what I mean.

How it works:

find nearest neighbor for each reference value
join the nearest measurement value onto the reference table (SimSearch doesn’t do it for us)
calculate the distance
branch off and filter down to A0 values; filter them using the distance
use that branch to filter by reference. keep only the groups not filtered out in the previous step
remove false matches using the Rule Engine

fuzzy Join two tables with GroupBy and Reference.knwf (111.3 KB)