Use of "Term co-occurrence counter" for a tag cloud?

Hi everybody, 

is there a possibilty to use the "Term co-occurrence counter" node for a tag cloud, so that e.g. the tag cloud shows not only the term "weather" but "good weather" and "bad weather", becaus those two other terms (good and bad) are used frequently with "weather". 

Thanks for the reply or any idea how to do so. :)

Jasmin 

Yes, you just need to extract both terms to strings (Term to String) than combine the two strings (Column Combiner) and finally convert the combined String back in to a Term (String to Term) on which you can finally apply the TagCloud. :)

Cheers, Iris

Hi Iris, 

thanks a lot for the quick reply :) 

It works until it comes to the "Tag Cloud"... 

After the second "String to Term" I used the "Column Filter" to continue with the "Document" column and the new columns with the "combined terms". After this I tried to apply the Frenqueny node (TF, IDF). However the only value I get is "0" (in all of the freqncy nodes). The Tag Cloud than is completey strange: it doesn't use "normal words" but the words with its tag.... 

When I use the value counter after the second "String to Term" node, I get a table like this (the values of the RowID actually contain the tags as well): 

RowID count
good weather 48
bad weather 15
stormy day 13
sunny day 9

Can't I just get a tag cloud out of this? Or is there any other possibilty? 

Thanks a lot for your advice and your time :) 

Jasmin

Hi,

no the tagcloud always needs a term column to work on.

You can extract your RowId into a string column with the RowId node and afterwards convert the String into a term with the String to Term node.

Best, Iris

Hi Iris, 

this is exactly what I did but after the "String to Term" node. With the IDF node (I have many docuemnts with only few words, with the TF node I only get the value: 0) I recieve values. However when I apply the Tag cloud it does not use "good weather" or "stormy wind" but " "good [ADJD(STTS)]", weather [NN(STTS)]" and " "stormy [ADJD(STTS)]", "wind [NN(STTS)]" " with the tags... 

Do you have a suggestion how I can get rid of this?

Thanks a lot

Jasmin

Hm, can you post me a screenshot of your workflow, or maybe just attach the workflow?

Hi Iris,

I solved the problem with the frequencies, by using the "GroupBy" node to sum up the Sentence Occurences created by the "Term co-occurrence counter" node. 

However the Tag cloud still contains the punctuation. Do you know how to get rid of them? 

Find my workflow attached. 

I appreciate your time and help :) 

Jasmin

Yes, in the Column Combiner, just delete the quote character and maybe the delimiter as well. And than use the Option Replace Delimiter by. Afterwards there are no more punctuations.

Iris
 

oh yes! I did not recognize this option! Thanks for the clue!

Do you know if it is also possible to use a Sentiment Analysis with this "Term co-occurrence counter" counter? So that eg. the negative words in one pair of words to distinguish between positive and negative words? 

Jasmin

Hi Jasmin,

I did not fully understand your question with the term co-occurrence counter and sentiment scores. You can of course assign sentiment labels to the terms, extracted by the co-occurrence counter. If you have a dictionary with terms and corresponding sentiment labels you can simply join the sentiments to the term co-occurrence table by joinning by the terms.

Cheers, Kilian

Hi Kilian, 

what I want is a tag cloud that colors postive words in green and negative words in red (keep neutrals grey). So if I now combine the terms the previous sentiment tags are erased. I tried to join the previous table with the sentiment tags, but due to the "Column Combiner" the two table don't match anymore... 

Do you have any suggestions to do so?

Jasmin

Hi Jasmin,

I assume you have a dictionary with positive and negative words. Use this dictionary for tagging with the Dictionary Tagger. You need two tagger nodes, one with the positive list, assigning positive labels and one with the negative list assigning negative labels. Do filtering, and other preprocessing. Create bag of words. Extract tags as string with the Tags to String node. Compute the tf frequency with the TF node. Assign colors based on tags with the Color Manager. Than use the Tag Cloud node.

Cheers, Kilian

Hi Kilian, 

what you discribe is the sentiment analysis I already used. However I want to combine this sentiment with the "Term co-occurrence counter". Is this possible? 

Maybe you have an idea. Find my example workflow attached. 

Thanks a lot for you help! 

Jasmin

Hi Jasmin,

not sure if I got that right: I assume you want to combine sentiments from single terms to a combined sentiment from two terms. You could extract the tags (Tag to String) and than specify a rule on which the sentiment is combined (Rule Engine). To group equal pairs of terms you can create a unique ID that is independent of the terms position, i.e. A - B is equal to the pair B - A.

Attached in an example workflow.

Cheers, Kilian

Hi Kilian, 

this is exactly what I meant! Thank you so much for this clever solution! 

Hoewever the column combiner make the sentence looks nicer than the creation of a unique ID. 

Thank you so much and for your help!

Jasmin

many thank to you Kilian, your solution also lit a bulb here. thank you so much.

online gambling