I find myself a lot of times having to join two datasets by a fuzzy match. I usually do it with a hard join on fields where I can and then calculate a scoring based on distance, which I then groupby on Minimum distance to arrive at a match between the two datasets. Is there a node in Knime to do that in one step, where I can define the join fields and also able to set a threshold of deviation for the join? Thanks.
Not with a single node, but it’s not too complicated. Take a look at this example featuring the Similarity Search node:
KNIME Indexing and Searching extension is not a one node solution in this case but can be used and is pretty good approach. Here is workflow example:
Hi @ScottF, Thanks for this workflow. I see the example you used is for string columns. I mostly deal with numeric fields for such fuzzy mapping. I see that I can also do that with the similarity search node. Could you please provide an example workflow, for me to get a grasp of the settings? Also why do you use the Counter node? can we not directly use column1 as the representative column instead of the counter?
what about converting your numerical fields to string? Would that work?
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.