I am experiencing some problems with this workflow for carrying out sentiment analysis.
The main problem is that I think I have configured everything correctly but I can’t get it to work with the Italian language. I have to analyse tweets in Italian but the end result is UNDEFINED. With the same procedure but in English everything works perfectly.
Thanks for the help!
The preprocessing steps normally use steps like stemming, tokenizing and so one based on the english language. You might want to check the preprocessing steps
Thanks for the suggestion. I set all the nodes where it was possible to do it from English to Italian. There seems to be some kind of problem with the recognition of Italian words.
Can you update the workflow itself, as opposed to a screenshot, along with some sample Italian data? Then we might better be able to check and see where things might be going wrong.
Yes, thanks! Here the workflow
Sentiment Analysis-2.knar (1.9 MB)
As a complete aside, is there a tutorial for performing this in English?
I took a brief look. Your input file doesn’t have any labels, so there isn’t anything for the model to train on. Can you explain a bit more what your three Excel files are supposed to represent?
We offer a free online course for performing text processing with KNIME. Search for L4-TP here:
These are the texts of the tweets of Italian politicians during the last elections divided into 3 macro topics. I wanted to make a specific analysis for each one
Are you just trying to determine whether the tweets are positive or negative? If so, with the supervised approach in the workflow, you still have to provide a label for the model to train on. Your datasets don’t seem to have that.
Have you looked at Spacy nodes? They might help you, since they are using more modern framework, also they have better support for multiple languages (for example there is a lemmatizer available for Italian). You can read more about the capabilities here: https://spacy.io/models/it
Also you can take this workflow as a reference for your work:
It processes Portuguese texts, however it is easy to select one of the Italian models and provide your texts.