Hi, I’m a completely new at using KNIME and ML in general. I want to do sentiment analysis on tweets regarding the Afghanistan crisis with ML/DP. I have access to Twitter Developer Account and was able to visualize tweets from different users using Twitter API Connector. My goal is to do multiclass classification of positive, negative and neutral sentiments using ML techniques. My question will be how do I proceed labeling these tweets? Do I have to do it one by one for each tweet manually or is it a more convenient way to do this? Do I have to create a CSV for labeling. Excuse me, but I’m very new to this software and I don’t understand how to proceed.
As I understood from you that you want to tag the tweets
there are 2 ways to do that, either using tagger node such as Stanford tagger or using Dictionary tagger where it will tag the words based on a provided file from your side
please check the text mining self-paced course in the link below as you need to do many steps in order to tag the words and to apply machine learning algorithms later, the steps include but limited to reading, cleaning, transformation and etc
We have several example workflows available for sentiment analysis, and some dealing with tweets directly. You might want to check out this space on the KNIME Hub that provides examples for several different approaches to sentiment analysis, of varying complexity:
Maybe give one of these a try, and come back with followon questions you might have.
Thank you for your suggestion. But I don’t have permission to view the source on the link, I get this message “You are not authorized to view this resource”.
Hi, still don’t understand what I’m doing wrong. I extracted tweets regarding my topic. Added enrichment and preprocessing steps. I don’t get the labeling process. Next, I will use SVM, Decision Tree and some other machine learning algorithms to do some sentiment classification. But I don’t get any results because I believe I do something wrong, but I don’t know where. I also don’t understand the labeling process, what should I do? This is my current workflow:
Any chance you can upload your actual workflow instead of just a screenshot? I assume this would be OK since it’s just publicly available Twitter data.
Also, what specifically do you feel is going wrong?
When I create BOW URLs are included, like “http” even though I use RegEx "?!/()=#:;”. I feel like the document/text or the tweets section is not being filtered correctly. I uploaded the actual workflow now.
When I plug in your RegEx into regex101.com I’m getting a syntax error, so maybe check there. (I’m not a RegEx pro by any means, many others here are very good with it though.) What do you intend for your RegEx to filter out?
I did notice that starting at your Number Filter node, you begin to append a new column to your dataset, instead of replacing the existing Document column as you did before. This is going to cause problems when you get to the Bag of Words because only some of the preprocessing steps will have been applied to the column that you end up selecting. Carefully go through your preprocessing nodes and make sure the replace column option is being applied consistently and correctly.
When it comes to labeling, do I have to label some of my tweets manually? Or can I for example use sentiment140 or Kaggle Airline Review dataset to train an ML model and then deploy this trained model with new tweets, which in this case are my tweets? I also see a lot of the examples are using the category to class node, is this necessary when working with supervised-learning?
If you don’t have labels for the individual tweets, one approach is to apply the positive and negative dictionaries to the documents, and then calculate a score based on how many positive versus negative words show up.
If you do have labels, then storing them in the document early using the Category in the Strings to Document node is useful, so you can pull them back out later with the Category to Class node prior to implementing your classification algorithm.
Thanks for the information @ScottF, it was very helpful. Is it also possible to use the Amazon Comprehend Sentiment Analysis Node, which labels the tweets into positive, negative, neutral and mixed sentiments and then implement classification algorithm?
Well, the Comprehend service is essentially doing the classification for you, so there would be no need to implement another classification model afterwards. Also, note that while the node itself is free, use of the Comprehend service is not.
Excuse me for asking so many questions, but can I train my model using tweets that are labeled from let’s say Comprehend service and then test with new unlabeled tweets with the same topic?
I also saw an example where the labeling process was handled by the Java Snippet node. The node was programmed to specify three different categories with relevant keywords for each category. Is it possible to label tweets in this way? Sorry in advance for asking many questions as I’m very new to ML and especially text processing.