I have to different tables of data.
Table 1 is human input
Table 2 is REAL name
I am wanting to try and pull the closest matches from Table 2 to Table 1. There will likely be multiple for each which is absolutely fine!
I was thinking I need to have each word in the cell enclosed in “” so that I can do string similarity but I am not sure if that is the best route or not either.
Any guidance would be much appreciated.
My suggested workflow
A - remove groups of chars from both columns in order to maximize (potential) similarity. Give the resulting columns the same name. For example, in Table 1 you could remove chars between parentheses and “inc.”, “co”, “llc”,… using the String Manipulation node
$Epic_Name_10004$, "^(.*?)\\(.*\\).*$", "$1"
"^(.+?) (co\\.*|llc\\.*|inc\\.*)(\\b| )*$", "$1"
In Table 2 you should remove everything after “-”
strip(regexReplace($soldto_txtsh$, "^(.*?)\\-.*$", "$1"))
B - Assuming you have now 2 columns both named “new_text”, you can use them in a Similarity Search node configured for example like this
and this is the resulting table
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.