I have to different tables of data.
Table 1 is human input
Table 2 is REAL name
I am wanting to try and pull the closest matches from Table 2 to Table 1. There will likely be multiple for each which is absolutely fine!
I was thinking I need to have each word in the cell enclosed in “” so that I can do string similarity but I am not sure if that is the best route or not either.
Any guidance would be much appreciated.
My suggested workflow
A - remove groups of chars from both columns in order to maximize (potential) similarity. Give the resulting columns the same name. For example, in Table 1 you could remove chars between parentheses and “inc.”, “co”, “llc”,… using the String Manipulation node
$Epic_Name_10004$, "^(.*?)\\(.*\\).*$", "$1"
"^(.+?) (co\\.*|llc\\.*|inc\\.*)(\\b| )*$", "$1"
In Table 2 you should remove everything after “-”
strip(regexReplace($soldto_txtsh$, "^(.*?)\\-.*$", "$1"))
B - Assuming you have now 2 columns both named “new_text”, you can use them in a Similarity Search node configured for example like this
and this is the resulting table