Concatenate a flow variable inside a RegEx


I'm working with Text Processing with Knime, and now I have created nGrams for several documents. Now I would like to filter respective documents based on a nGram, but still didn't figure that out. Right now, I'm testing with Rule-Based Row filter like this:

$Text$ MATCHES /^.*nGram.*/ => TRUE

where $Text$ is the string with the content of a document, and If I add the text directly on the expression, it works.

My problem is: I want to use a flow variable instead of typing text directly. I already created the domains, and I can also have my variable as a quickform, and I teste and the variable is built correctly. The proble is how to concatenate it on the RegEx above. I tried as below and still didn't work.

$Text$ MATCHES /^.*$${SNgram}$$.*/ => TRUE

Appreciate any tips!


Gustavo Velho 

In case you have the flow variable values in table, you can use the Rule-based Row Filter (Dictionary) with a table supplied by the rules concatenated by String Manipulation.

Alternatively you can create the rule text using Java Edit Variable (concatenate the texts and the flow variable value with +) and use that as the first array element in the rule-based row filter node's rules array. (Done this before the dictionary version of nodes were developed.)

Be careful though, you might need to add \Q and \E around your flow variable to prevent unwanted matches (quotation).

Cheers, gabor

Thanks Gabor! I think the answer is through Rule-Based Filter, but still not quite what you proposed (or I didn't quite understand what you explained... :) ).

My flow variables are in a Value Selection Quick Form, based on a list of nGrams previously created. This way I can use it to filter manually as needed. This is working and it's how I want/need.

The second step is to use the value I select in the Value Selection Quick Form in a Rule-Based Filter (or perhaps in a Java Snippet?) to collect documents from another table. But the other table contains the full texts.

For example:

Black Panther is the best character in Civil War.
Nice fight scene with Black Panther and Winter Soldier.
Civil War movie is pretty good, perhaps one of the best on MUC.

I would like to filter only the documents containing "black panther" inside the content, and "black panther" is one of the possible values on my variables. I've seen that the Rule-Based filter can find a expression like "black panther" inside a document, but I don't know how to make the same thing if the value is a flow variable.

Again, thanks for your help!

Gustavo Velho 

OK, after dedicating a bit more time on playing with Java Edit Variable, I could achieve what I wanted, thanks Gabor!

Value Selection Quickform -> Java Edit Variable -> Rule-Based Row Filter

The value selection allows for me to select what value I want to filter. Then using the Java Edit Variable, I edit the value like this

out_NGramEdited = "*"+v_Ngram+"*";

Then, I filter the rows I want using the rule based like this:

$Text$ LIKE $${SNGramEdited}$$ => TRUE

It works! Although I'm not sure it's the best way. :)

Anyway, thanks again! This is what I need.

Gustavo Velho