Check a set of strings inside a set of strings

Hi there,

I'm working with Ngrams, and I'm willing to find Ngrams with 3 words inside Ngrams with 4 words. I did a research and I found a workflow that uses String Manipulation to create rules based on 3-grams (rules like $Ngram4words$ LIKE "*word1 word2 word3*"), and then I used the Rule Engine dictionary to apply those rules to the 4-grams.

It does work, however, it applies only to 1 match, usually to the 4-grams that contains the 3-grams on the start of that phrase. But I would like to match all findings. Any tips would be helpful!

Thanks!

Gustavo

 

 

 

Hi Gustavo,

currently the rule engine will always output only one matching rule. I will add a +1 to the feature request for you.

What you can do is: Use a chunk loop start on the rules and apply one rule to the data set in one iteration. Than filter the output after the loop end to contain only matching rules.

If you upload your predefined workflow, I can add you the looping part I just described.

Cheers, Iris 

Thanks Iris! I tried the loop but initially I was doing 1 row at time. But I found a better solution: I created 2 rules and used in the same data set then merged the data. It worked.

Gustavo

1 Like

been a long time but i have the same problem could you please share your solution.