I am a new user in Knime and I am working on a sentiment analysis project. I am doing text categorization by using a sentiment (positive/negative) dictionary, BoW, and then TF-IDF. I would like to know how I can handle negation issues (not good, didnt like, etc). Which nodes would be useful to account for negation issues when counting the frequencies of terms (n-grams)?. I have seen that by using parse trees this it is possible to sove this issues, but not sure how to do it in Knime. Please any guideline would be very useful.
so far there is not deep parsing provided by the Textprocesing plugin. With the Wildcard Tagger node it is possible to tag terms based on regular expressions. Negations, matching these expressions can be found and tagged with a certain tag, which can be filtered or counted later on.
To count frequencies of terms the TF node is the right node. In the dialog you can specify relative or absolute frequencies to count. The N-Gram node can be used to find ngrams and count their frequencies.
Thanks Killian. I will try with the Wildcard then. Is it any way to integrate something like Stanfornd Parser (You already have the stanford Tagger node), this will help me to detect wor dependencies associated with negation?.
So far it is not planned in the near future (next few months) to integrate deep parsing. But of course it is possible to write your own node and use the Stanford lib internally. If you are interested in writing a tagger node i can give you some tipps if you want.
Hi Killian yes that would be great I am very interested. You can give us some tips and I can start trying and see how far I get.
Also I wanted to ask you a simplier question. We are ussing the Dictinary tagger for tagging positive and negative words, however within the dictionary node there is no a TAG value and type for positive or negative. So till now I am just ussing tag Values that pre exist. Is it any way to create a tag type and value like sentiment; positive and negative?.
There exists an TagSet extension point, that allows for implementing and integrating your own tag sets. However, you have to write some Java code to do that but it is possible to create your onw tag sets.
For the positive, negative tagging a quick and easy way would be to use two arbitrary tag values and filter / count them later on with the corresponding filter node.
I will write a small tutorial about, how to integrate your own tag set and your own tagger nodé in the next days and publish it on the Textprocessing site.