Sentiment Analysis - Converting Term Tags into Numbers

Hello,

I am trying to reproduce aspects of the workflow I saw in the officail KNIME Webinar on text analysis (link here). In this section the instructor is explaining that Term tags should be converted into numbers so that the numbers can subsequently be aggregated to determine the overall sentiment of the document. I understand the concept, but unfortunately the execution of it is giving me trouble and is not explicitly shown in the video.  

I attempted to use a Rule Engine node, as shown in the video, but using a wildcard command such as 'LIKE' on a Term column produces an error saying "Invalid settings: line 5, col 0: Expression before 'LIKE' is not a string."  I then tried using a command using the equal sign '=' and a wildcard statement "$Term$ = "*NEGATIVE*" => -1". This was acceptable by the Rule Engine, but failed because (I am assuming) use of the = looks for an exact match and does not attempt a wildcard. 

The outcome I am hoping for is a new column called "Sentiment" that has 1 or -1 based on the Term tag that was applied using a dictionary tagger. Is a Rule Engine the answer or is there something else I should be trying? 

I appreciate any help I can get.

Thanks,

Sam

Hi Sam,

as far as I understand your problem, my suggestion would be to use the dictionary tagger to tag the words POSITIVE or NEGATIVE, then using a bag of words node to extract the single terms and put these terms into the tags to string node.

Then use the String Replacer for converting POSITIVE into 1 and NEGATIVE into -1 and the Strings to Number node to convert them into integer values.

By using the GroupBy node you can then aggregate the sum of sentiment per document and rejoin this column with your original table.

For sure there's a more elegant way but this should do the trick for know.

 

Regards,

Tim

Thanks very much Tim! The Tags to String node is exactly what was needed.

 

Thanks,

Sam