Hi,
I’m carrying out a sentiment analysis with the Knime platform using Twitter’s tweets. I have two questions for you, the first in on the categorization of the tweet, I don’t understand in which way it is possible to create a positive and negative label. I saw a sentiment analysis example on your forum but the comments in the file reader are already provided with positive and negative labels. Do I need to link at R database? (Which I don’t know how to use)
The second question is based on how to clean up tweets from languages that don’t interest me such as French, Spanish, Chinese, etc.
Could you help me, please?
Thank you!!
Hi @millainthesky welcome to the KNIME forum,
For labeling your selected tweet as positive or negative, you need pre-labeled dataset where words are already classified as positive or negative. Do a search for something like this positive and negative words list for sentiment analysis But which words are classified as negative or positive can be a matter of taste and domain specific.
For the second question I don’t have a straightforward answer. With the Tweet itself comes also information about language and country, but to my experience these columns contain most of the time no information. But the way to clean up your tweets will be a more iterative process, depending on the results of your query. Maybe some Row Filtering (with words of languages with no interest) can help?
gr. Hans
You could also use text classification for language detection. We’ve a ready to use workflow using the Palladian text classification nodes and there’s even a pre-trained model:
https://www.knime.com/book/text-classifier
Not sure how well it will perform on Tweets, but it’s worth a try.
– Philipp
PS: Btw; we (and some of our users/customers) have been using the Palladian text classifier successfully for sentiment classification as well. This of course requires a pre-labeled training set.
Thank you for your answer!
Unfortunately the “Row Filter” cannot filter the words I entered and therefore is not suitable to solve my problem.
I think that the solution to my problem is to implement an R Knime node and configure this with an English dictionary. In the meantime, I’ll try to use the Palladian text classification node and I hope that the solution works.
If you think of anything else, please write again.
Camilla
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.