I was hoping someone could help here with labeling topics by keywords into this format:
ID / TEXT
1 / I really liked the flavor, but the smell was bad.
2 / I loved the taste but I’m worried about the SLS
LOOKUP TABLE - topics with lists of keywords to search for below
Taste / Odor / Ingredients
flavor / smell / SLS
taste / odor / ingredients
yucky / fragrance / sulfate
DESIRED OUTPUT TABLE
ID / Topics
1 / Taste
1 / Odor
2 / Taste
2 / Ingredients
this workflow shows you two options.
Option 1: Without the text processing extension
Option 2: With the text processing extension
In option two the text processing extension allows you to make your workflow more robust, e.g. by removing punctuation and by lower casing all words. Of course all these steps could be also performed by a sequence of String Manipulation nodes.
Please have a look at the two examples and let me know in case of any questions.
Thanks Kathrin! I didn’t like using the dictionary tagger since it doesnt label how i want. However, i figured i could just use a combination of your approaches and just JOIN based on the bag of words and the keywords converted to Terms.
But each step of the texprocessing nodes take a long time to run on 200,000 rows. so i’m going to test and see if just going the non-textprocessing approach is quicker with the cell splitter.
@Kathrin - Originally i had been using a series of rule engine nodes linked to a table creator and each table had rules for a single topic. This seemed quicker than these approaches here, but wasn’t easy to edit the keywords or topic lists. Any suggestions to go down that route?
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.