Sentiment Analysis Labeling // Punctuation Erasure

SvenjaMz · May 24, 2021, 12:47pm

Hello,

we would like to analyze Tweets with a sentiment analysis. API connection and everything regarding integrating Tweets works fine.

This article already helped us a lot, however, we still struggle to correctly label the words as positive and negative by using the node Dictionary Tagger (is there a dictionary with labels?). Also, the # cannot be removed by using the node Punctuation Erasure (actually, there are still all kinds of punctuation).

Maybe you can help @kilian.thiel ! We saw you are familiar with these processes.

Thanks in advance

Eva, Sara & Svenja

julian.bunzel · May 25, 2021, 9:31am

Hey @SvenjaMz,

welcome to the KNIME forum!

I don’t have list of labels at hand, but there are several available in the internet which you could use to label words with the Dictionary Tagger.

When it comes to removing punctuation, I’d recommend to have a look at your node settings. The Punctuation Erasure node (or any other pre-processing node) will most likely create a new column called Preprocessed Documents (except there is already one, then it will take that one). When checking the terms/documents with the Bag Of Words Creator node or the Document Viewer node, make sure that you have selected the correct column with the preprocessed documents instead of the original document column.

I hope this helps.

Best,
Julian

SaraFu · May 31, 2021, 2:53pm

Hi @julian.bunzel,

thank you very much for you help. We found a dictionary that we would like to use.
However, if we run the dictionary node, we can not see the results.

Our workflow looks exactly like this one, the only difference is, that we do not have a column that says if our document is positive or negative. We tried using the Document Viewer, which didn’t help.

Thank you for your help.

Best,
Svenja, Eva and Sara

julian.bunzel · May 31, 2021, 3:54pm

Hi @SaraFu,

the Dictionary Tagger node will tag the documents but you will not be able to see the tags by just viewing the result of the Dictionary Tagger node as the table only shows the title for document (for sake of clarity).

If you want to check if the Dictionary Tagger did its work, you can use a Bag of Words Creator node afterwards which will create a new table listing every term that is available within a document plus its assigned tag. Make sure that you select the correct document column in the configuration of the Bag Of Words Creator.

However, using the workflow you linked should be just fine. Everything up to the Color Manager node should work. What kind of issues do you encounter?

Best,
Julian

SaraFu · June 8, 2021, 10:27am

Hey @julian.bunzel,

thanks a lot for your help. The tagging works.
Now there is one problem left, the scorer isn’t working as we can not define a document category.
The problem must lay somewhere between the rule engine and the category to class node. The settings are identical to the one from the workflow linked above.

Our sentiment score, that we want to categorize using the document category node, is a double integer instead of a integer. Is this the problem?

See this screenshot for an insight into our problem:

Thank you for your help.

Best,
Sara

SvenjaMz · June 11, 2021, 12:39pm

Hi @julian.bunzel ,

so we figured that our workflow has the following problem: classifying the document as positive if sentiment score > threshold and negative if otherwise. However, we cannot find the right settings to do so (or even where to do so). It this is the problem we’ve been facing.

Thank you so much for all your help.

Regards

Svenja

julian.bunzel · June 11, 2021, 12:47pm

Hi @SaraFu and @SvenjaMz,

this happens within the Rule Engine node which resides within the Calculate Score metanode. You can open a metanode by double-clicking it. I would check the outputs of each node to see where the data looks not the way you would expect it to be. Then you might be able to change a setting.

Best,
Julian

SvenjaMz · June 11, 2021, 12:54pm

Hi @julian.bunzel

thanks for pointing that out. Actually, we already were in the settings and cannot find out mistake.

With these settings, after the executing the Engine Node, and the following in the above mentioned workflow, we get the results that Sara posted in here as well.

Thanks again and sorry for all the trouble caused.

Best,

Svenja

julian.bunzel · June 11, 2021, 1:28pm

How does the output of the preceding node look like?

SaraFu · June 11, 2021, 2:05pm

Hi @julian.bunzel,

thanks for your quick feedback.

So this is the part of the workflow, that we are talking about:

This is the output of the Table Row to Variable:

And this is the output of the Math Formula node

Hope that helps! Thank you very much for your help so far.

Have a great weekend,
Sara

julian.bunzel · June 15, 2021, 8:48am

Hi @SaraFu, @SvenjaMz,

it actually looks correct up to then. I assume the output of the Rule Engine node is correct as well. There should be a new column containing either POS or NEG as value.
Which columns are selected in the Scorer node?

Best,
Julian

system · December 14, 2021, 8:48pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.