Sentiment Analysis with n-gram

Hi,

I want to apply a workflow for sentiment analysis with n-gram (https://www.knime.org/blog/sentiment-analysis-with-n-grams) for my twitter data but i don't know how to predefine sentiment labels. Could someone help. 

Thank you.

 

Hi saip,

in the type of sentiment analysis described in the KNIME blog post your referenced, it is up to you to determine the sentiment labels. It could just be binary, like "positive" vs. "negative", or it could be multi-class, like in "happy" or "neutral" or "sad", etc.

The idea is that you manually label you training data set with the proper sentiment, then use it to train a classification model to associate certain features (n-grams) in the Tweets to the given sentiment.

Once the model is trained, you apply it to a test data-set and have it predict the "unknown" sentiment for each Tweet based on the features (n-grams) that appear in it.

The more labels you define, the more difficult it becomes for the classifier to "learn" the differences between them and therefore for the predictor to assign the right label to an unlabeled Tweet. 

There are a number of Twitter data set already labeled which you can use to play with. Most of them are labeled simply with positive/negative.

Also this type of techniques, based solely on n-grams, do not really analyze the context of what is being said in the Tweet nor can easily deal with double sentiment cases like in "I love KNIME but I hate R".

Hope this helps.

Cheers,
Marco.

 

Hi Marco,

Thank you very for the  explanation. It is really help me to understand the concept.

Best wishes,

saip

Hi all,

I am using N-gram to train my model with a dataset of 1.500.000 Tweets classified as positive or negative. I'd like to apply my trained model to my own database of tweets. My question is, How do i do this? How do I implement my own dataset to the trained model? Is there a way to save the trained node to use it in another workflow? I am very lost at this point. Any help is very welcome. Thanks!