String manipulation and similarity comparing


I have to different tables of data.
Table 1 is human input
Table 1

Table 2 is REAL name

I am wanting to try and pull the closest matches from Table 2 to Table 1. There will likely be multiple for each which is absolutely fine!
I was thinking I need to have each word in the cell enclosed in β€œβ€ so that I can do string similarity but I am not sure if that is the best route or not either.

Any guidance would be much appreciated.

hi @ahortonmilsig
My suggested workflow

A - remove groups of chars from both columns in order to maximize (potential) similarity. Give the resulting columns the same name. For example, in Table 1 you could remove chars between parentheses and β€œinc.”, β€œco”, β€œllc”,… using the String Manipulation node

			$Epic_Name_10004$, "^(.*?)\\(.*\\).*$", "$1"
		"^(.+?) (co\\.*|llc\\.*|inc\\.*)(\\b| )*$", "$1"

In Table 2 you should remove everything after β€œ-”

strip(regexReplace($soldto_txtsh$, "^(.*?)\\-.*$", "$1"))

B - Assuming you have now 2 columns both named β€œnew_text”, you can use them in a Similarity Search node configured for example like this

and this is the resulting table


This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.