Topic Extractor Node

victor_palacios · April 5, 2022, 7:11pm

First we lowercase to get the best possible matches as upper and lowercase letters are not the same for computers.

Then we use a little regex joining all the search terms together with | (which means “OR” in regex).

Search for the word apple or dog ==> apple|dog

This solution actually works better on this sample of data, but if you have a lot of misspellings, this won’t work that well unless you account for those misspellings.

If there are common misspellings then this just add them to your list of words to search, if not, you may have accept some loss of information in exchange for automation.

Notice this method is more convenient because you can filter non-matches using missing rows from the Split Value 1 column.