Text Analytics Tag cloud Filtered by Rule based Row filter

I have completed a text analytics "Tag Cloud" workflow (as in the attachment).

Now, I realised I want to run the whole workflow but ONLY for a specific rule condition.

To define this condition, I use "Rule based row filter" node, for another column of my database.

My question is: where in the workflow I need to place the "Rule based row filter" node to reach my goals.

Needs to be placed before  the "String to Document" mode?



So, I put on the beggining of the workflow the rule-based filter node, before the String to ducument node. RESOLVED. it works as I want it.

Now my problem is as follows:

I have a column appt_name, its is open text. If I use the groupby node I get 45 groups in total.

I want to "bin" or "group" these groups to less groups. make them 4 groups only.  How can I do that?


and then I have a column Open notes next to appt_name column that also includes open text (whole sentences).

In this open text, there is sometimes the indication 3w or 6w or 4w (w=weeks). I want to check per group (the 4 groups that I will create), how many indications of 3w 4w etc. I have and which are the most popular repeating itself.

all in all, the challenges I am facing are:

1) I dont know where to place in the workflow the group by

2)after groupby (45 groups) how can I bin/group them to less eg. 4 groups (reminder is open text).

2)How on color manager I can get the 4 groups, combined with the POS tagger.

3)How can I keep the numbers (eg. 3w, 4w, 6w etc.). Do I need to exclude Number filter from my workflow.

Hi Atzitzi,

regarding your first question. Binning is only possible on numerical values. Reducing a number of groups / classes to a lower number is basically mapping a set of classes to a smaller set of classes. You could do this manually by using the Cell Replacer node and specifying a dictionary which class will be replaced (mapped) by which. Or you could do this automatically e.g. by hashing the classes and then modulo the hash code to the number you want. Therefore you need to use the Java Snippet node.

Do the grouping and cell replacing before creating the documents. Use the mapped classes as document categories in the Strins to Document node. Extract this information after you have created a bag of words by using the Document Data Extractor. Use the extracted column as color column for Color Manager.

To count terms like 3w, 4w etc in each group. you can use the WIldcard Tagger and tag terms with a regex "\dw". Now filter all other terms and create a bow. Extract the group info (as category info) and group by class and term.

I hope this helps.

Cheers, Kilian

Many thanks Kilian!