Sentiment Analysis of Afghanistan Crisis using Twitter

Huseyin · September 17, 2021, 10:22am

Hi, I’m a completely new at using KNIME and ML in general. I want to do sentiment analysis on tweets regarding the Afghanistan crisis with ML/DP. I have access to Twitter Developer Account and was able to visualize tweets from different users using Twitter API Connector. My goal is to do multiclass classification of positive, negative and neutral sentiments using ML techniques. My question will be how do I proceed labeling these tweets? Do I have to do it one by one for each tweet manually or is it a more convenient way to do this? Do I have to create a CSV for labeling. Excuse me, but I’m very new to this software and I don’t understand how to proceed.

Thanks in advance

Best Regards,

Huso

Mutaz · September 17, 2021, 1:51pm

Dear @Huseyin

As I understood from you that you want to tag the tweets

there are 2 ways to do that, either using tagger node such as Stanford tagger or using Dictionary tagger where it will tag the words based on a provided file from your side

please check the text mining self-paced course in the link below as you need to do many steps in order to tag the words and to apply machine learning algorithms later, the steps include but limited to reading, cleaning, transformation and etc

https://knime.learnupon.com/enrollments/96733634/page/381705044

I hope it was helpful

Regards

ScottF · September 17, 2021, 2:25pm

Hi @Huseyin and welcome to the forum.

We have several example workflows available for sentiment analysis, and some dealing with tweets directly. You might want to check out this space on the KNIME Hub that provides examples for several different approaches to sentiment analysis, of varying complexity:

Maybe give one of these a try, and come back with followon questions you might have.

Huseyin · September 19, 2021, 7:04pm

Thank you for your suggestion. But I don’t have permission to view the source on the link, I get this message “You are not authorized to view this resource”.

ipazin · September 20, 2021, 10:40am

Hello @Huseyin,

here you can find list of self-paced courses including one for text mining st the bottom:

Br,
Ivan

Mutaz · September 21, 2021, 5:16pm

Hi @Huseyin

Go to Knime website > Learning > Knime Courses > Self-Paced Courses > [L4-TP] Introduction to Text Processing

Regards

Huseyin · September 22, 2021, 10:22pm

Thank you so much! Very helpful course

BR,
Huseyin

Huseyin · September 22, 2021, 10:23pm

This was very helpful, thank you so much!

BR,
Huseyin

Huseyin · October 12, 2021, 7:41pm

Hi, still don’t understand what I’m doing wrong. I extracted tweets regarding my topic. Added enrichment and preprocessing steps. I don’t get the labeling process. Next, I will use SVM, Decision Tree and some other machine learning algorithms to do some sentiment classification. But I don’t get any results because I believe I do something wrong, but I don’t know where. I also don’t understand the labeling process, what should I do? This is my current workflow:

ScottF · October 12, 2021, 8:23pm

Any chance you can upload your actual workflow instead of just a screenshot? I assume this would be OK since it’s just publicly available Twitter data.

Also, what specifically do you feel is going wrong?

Huseyin · October 12, 2021, 8:53pm

Thesis_Project.knwf (52.7 KB)

When I create BOW URLs are included, like “http” even though I use RegEx "?!/()=#:;”. I feel like the document/text or the tweets section is not being filtered correctly. I uploaded the actual workflow now.

ScottF · October 12, 2021, 9:08pm

When I plug in your RegEx into regex101.com I’m getting a syntax error, so maybe check there. (I’m not a RegEx pro by any means, many others here are very good with it though.) What do you intend for your RegEx to filter out?

I did notice that starting at your Number Filter node, you begin to append a new column to your dataset, instead of replacing the existing Document column as you did before. This is going to cause problems when you get to the Bag of Words because only some of the preprocessing steps will have been applied to the column that you end up selecting. Carefully go through your preprocessing nodes and make sure the replace column option is being applied consistently and correctly.

Huseyin · October 12, 2021, 9:14pm

Thank you so much for very quick feedback!

Huseyin · October 13, 2021, 12:26am

When it comes to labeling, do I have to label some of my tweets manually? Or can I for example use sentiment140 or Kaggle Airline Review dataset to train an ML model and then deploy this trained model with new tweets, which in this case are my tweets? I also see a lot of the examples are using the category to class node, is this necessary when working with supervised-learning?

ScottF · October 13, 2021, 1:49pm

If you don’t have labels for the individual tweets, one approach is to apply the positive and negative dictionaries to the documents, and then calculate a score based on how many positive versus negative words show up.

If you do have labels, then storing them in the document early using the Category in the Strings to Document node is useful, so you can pull them back out later with the Category to Class node prior to implementing your classification algorithm.

Huseyin · October 15, 2021, 8:55am

Thanks for the information @ScottF, it was very helpful. Is it also possible to use the Amazon Comprehend Sentiment Analysis Node, which labels the tweets into positive, negative, neutral and mixed sentiments and then implement classification algorithm?

ScottF · October 15, 2021, 4:42pm

Well, the Comprehend service is essentially doing the classification for you, so there would be no need to implement another classification model afterwards. Also, note that while the node itself is free, use of the Comprehend service is not.

Huseyin · October 15, 2021, 7:07pm

Thanks for the clarification. So if I understood correctly, to use ML I need to have labeled data. If not, use lexicon based approach?

ScottF · October 15, 2021, 7:21pm

That’s a good summary of it.

Huseyin · October 15, 2021, 7:50pm

Excuse me for asking so many questions, but can I train my model using tweets that are labeled from let’s say Comprehend service and then test with new unlabeled tweets with the same topic?
I also saw an example where the labeling process was handled by the Java Snippet node. The node was programmed to specify three different categories with relevant keywords for each category. Is it possible to label tweets in this way? Sorry in advance for asking many questions as I’m very new to ML and especially text processing.