I have a table with one column and multiple rows full of string values like the following example:
Row1 john goes to school everyday but sunday
Row2 emily likes to eat icecream with strawberry flavor
Row3 ashley works very hard for her test on calculus
What I would like to accomplish is to create filter words (“john, flavor, test”) test them against each row. If there is a match, then the matching word will appear in a new column of the same row:
Row1 john goes to school everyday but sunday john
Row2 emily likes to eat icecream with strawberry flavor flavor
Row3 ashley works very hard for her test on calculus test
I want to test all filter words at the same time.
What should happen if you have more than one hit? What do you mean by "all at the same time", are loops an option? How big can your lists of words & rows become?
The best single-node solution I could think of would be the Rule Engine node or a Java Snippet, but for multi-node approaches there's a lot more than one option (with and without loops)...
There is no chance of having more than one hit per row, because always just one of the filter words exists in a cell. Loop doesn't look like an option. I have just one column and 2000 rows to be filtered. However, there is one twist of my document I forgot to mention in my first post: it is updated on daily basis via database and I need a solid solution which doesn't require my further attention every day.
I tried several things in Rule Engine node, but couldn't succeed at all. My knowledge of java is also iffy. If you have any suggestion, I would really appreciate.
As your example seems like a simple one (ie you will only see one match per input), I think you can stick with your attempt at using the Rule Engine node. However, if you have *lots* of filter words to look for I would probably change the strategy or generate the Rule Engine text programmatically and pass it in via a flow variable...
Anyway for your example I have attached a workflow that shows how to tackle the problem by first converting the input sentences into word sets (using the cell splitter node with a " " character as the split), then using the Rule Engine IN clause (eg "john" IN "$set" => "john").
I hope this helps for your full use case.
Thanks a lot James.
I have never thought of using cell splitter in the combination. You saved my day.
The task seems to be solved, but if you ungroup the word set, while keeping the original sentence, you can then join with a table of the words you want to look up, giving you the required result. That way, no FlowVar or fiddly rule-building is necessary, the word list can easily be supplied via csv or the like, and it's also easy to protect against cases where double matches or no matches do happen. (Better check than trust...)
is there a reasons you are not using the text processing nodes?
With them it is very easy to filter only the chosen terms. And you don't need to manual edit the rule engine for new words. I attach you an example.