I’m working with sentiment analysis with tweets. I have little more than 17 000 tweets which are categorized into three classes (pos, neg, neutral). My goal is to train several different supervised models to evaluate the accuracy of each model and add data visualization views. Below I will include my current workflow (the dataset with label for each indivudal tweet is included in the CSV file): Project Sentiment Analysis SML_export.knwf (434.7 KB)
Is there anything I can do to improve my workflow, whether it is to improve the model accuracies, add visualizations or add data preparation/preprocessing steps?
Thank you! Do you know why Knime stops responding when I run the Random Forest trainer? In addition, should I change my ML parameters or just use the standard configuration as it is? Since the majority of the tweets are negative, I tried the Equalize Size Sampling node for the SVM trainer, but I got worse accuracy. Is it worth trying to use this for the other algorithms? Also wondering, if there are graph plots for evaluation for multiclass predictions. I know that ROC curve can be used for binary classification problems, but is there other similar alternatives for multiclass classification problems ( i.e. f1 score measure)? Finally, is there any other visualisation node which can come in handy for such workflow? I will try to integrate Tableau.
Thank you for your valuable input!
I have a question regarding adding color to my bar chart. I want to add colors to differentiate my three classes, but even if I use the color manager node, the bars are only appearing in one color. I tried several methods asked by other users in this forum regarding the same question, but I haven’t succeeded in applying them in my workflow. The bar chart in Data Visualization node is connected to my csv dataset.The sentiment column header contains my three classes associated with each row. Is there a simple solution to this problem?