Sentiment analysis in italian

ElioSimone2022 · January 9, 2023, 9:19am

Good morning!
I am experiencing some problems with this workflow for carrying out sentiment analysis.
The main problem is that I think I have configured everything correctly but I can’t get it to work with the Italian language. I have to analyse tweets in Italian but the end result is UNDEFINED. With the same procedure but in English everything works perfectly.

Thanks for the help!

Daniel_Weikert · January 9, 2023, 5:32pm

The preprocessing steps normally use steps like stemming, tokenizing and so one based on the english language. You might want to check the preprocessing steps
br

ElioSimone2022 · January 9, 2023, 6:19pm

Thanks for the suggestion. I set all the nodes where it was possible to do it from English to Italian. There seems to be some kind of problem with the recognition of Italian words.

ScottF · January 11, 2023, 3:29pm

Can you update the workflow itself, as opposed to a screenshot, along with some sample Italian data? Then we might better be able to check and see where things might be going wrong.

ElioSimone2022 · January 11, 2023, 8:15pm

Yes, thanks! Here the workflow

Sentiment Analysis-2.knar (1.9 MB)

JWebb · January 12, 2023, 9:27am

As a complete aside, is there a tutorial for performing this in English?

ScottF · January 12, 2023, 7:35pm

I took a brief look. Your input file doesn’t have any labels, so there isn’t anything for the model to train on. Can you explain a bit more what your three Excel files are supposed to represent?

ScottF · January 12, 2023, 7:36pm

We offer a free online course for performing text processing with KNIME. Search for L4-TP here:

ElioSimone2022 · January 12, 2023, 7:39pm

These are the texts of the tweets of Italian politicians during the last elections divided into 3 macro topics. I wanted to make a specific analysis for each one

ScottF · January 12, 2023, 8:17pm

Are you just trying to determine whether the tweets are positive or negative? If so, with the supervised approach in the workflow, you still have to provide a label for the model to train on. Your datasets don’t seem to have that.

Artem · January 13, 2023, 8:55am

Hello @ElioSimone2022

Have you looked at Spacy nodes? They might help you, since they are using more modern framework, also they have better support for multiple languages (for example there is a lemmatizer available for Italian). You can read more about the capabilities here: https://spacy.io/models/it

Also you can take this workflow as a reference for your work:

It processes Portuguese texts, however it is easy to select one of the Italian models and provide your texts.

system · April 13, 2023, 8:55am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.