String manipulation and similarity comparing

Hi,

I have to different tables of data.
Table 1 is human input
Table 1

Table 2 is REAL name

I am wanting to try and pull the closest matches from Table 2 to Table 1. There will likely be multiple for each which is absolutely fine!
I was thinking I need to have each word in the cell enclosed in β€œβ€ so that I can do string similarity but I am not sure if that is the best route or not either.

Any guidance would be much appreciated.

hi @ahortonmilsig
My suggested workflow

A - remove groups of chars from both columns in order to maximize (potential) similarity. Give the resulting columns the same name. For example, in Table 1 you could remove chars between parentheses and β€œinc.”, β€œco”, β€œllc”,… using the String Manipulation node

strip(
	regexReplace(
		regexReplace(
			$Epic_Name_10004$, "^(.*?)\\(.*\\).*$", "$1"
		),
		"^(.+?) (co\\.*|llc\\.*|inc\\.*)(\\b| )*$", "$1"
	)
)

In Table 2 you should remove everything after β€œ-”

strip(regexReplace($soldto_txtsh$, "^(.*?)\\-.*$", "$1"))

B - Assuming you have now 2 columns both named β€œnew_text”, you can use them in a Similarity Search node configured for example like this

and this is the resulting table

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.