Sentiment analysis - term frequency

I am trying to calculate sentiment scores for a set of articles. I have created a bag of words with the sentiment-tagged terms (using the "dictionary tagger" node), and now I want to calculate the term frequency of postive and negative words in each article.

I filtered the terms to only positive using the 'row filter' node, then calculated the term frequency, and did the same for the negative words (see screenshot of my workflow attached). But now how do I relate them to my original document? There doesn't seem to be any way of joining the two forks together, or grouping them by document?

 

 

You can use a Concatenate node to merge the two final tables into a single one (appending the respective rows).

Regarding the possibility to group by document, it is difficult to advise withouth having seen the rest of the workflow and how your tables look like.

Speaking in generic terms, one possibility would be to identify each document with a unique ID (numeric), carry it along to also identify the words derived from that document (you may end up with a collection of ID's for each term depending on how many documents contain that term), then use it as grouping factor at the end.

Hope this helps for the moment.

Cheers,
Marco.

Very helpful, thanks again Marco!