Text processing and dictionary tagger question

Hi all, 

I am working on a sentiment analysis project. I am loading a big quantity of opinions into my KNIME workflow and I am using custom made dictionary for tagging terms and phrases. However, when tagger does not find any of the terms or phrases in opinion, it discards the opinion. I would like to keep all the opinions in my workflow, including those ones that don't have any matching terms or phrases from the dictionary, because I need to know that they have been loaded and passed through workflow.

Is there a way to keep all text opinions, although they don't contain any term or phrase from the dictionary? After Dictionary Tagger node, I loose them and I would like to keep them, but would assign them a special value.

Thank you in advance, 

Ana

Hi Ana,

 

i am not quite sure what you are planning to do with the dictionary word that no terms are matching on.

The wildcard and dictionary tagger node tags term that match to expressions or words from a dictionary. Expressions or words are not passed through the workflow but terms matching those. I assume that you want to pass through all expressions (terms matching those) even if there are no terms matching these expressions. What do you want to do with these expressions to which no terms are matching?

 

Cheers, Kilian

Hello Killian, 

firstly, thank you for the answer. 

Yes, I plan to pass all the expression through the workflow, even if they don't contain any term from the dictionary. The expression would be written to the database, but with the special flag, meaning that it has passed through workflow, but with no matching terms from the dictionary. 

The reason I want to do this, is beacuse I load new expressions every day from the database (HBase, Hadoop, it means very large amount of data). But, expressions with no matching terms are also loaded every time, again and again, and they are piling up in the database. So after some time passes, when I try to load a small quantity of new expressions, I will get only a bunch of expressions that have no matching terms. I want to pass them thourgh workflow anyway so they don't pile up in the database and so I can get new expressions every time when I load the data... 

I currently don't have time to work on the dictionary to improve it, so this would be my temporary solution.  Is there a way to add some value in the dictionary, that occurs in every expression and assocciate it with a flag? Can KNIME tag characters as dot, comma or space? Any temporary solution is welcome...

Thank you in advance, 

Ana

 

Hi Ana,

 

the tagging process tags terms matching to the entries of the dictionary. If there are not terms that match certain dictionary entries, nothing is tagged. The dictionary words itself are never passed through the workflow, only matching terms.

 

One solution would be to concatenate the words of the dictionar afterwards to the (Grouped?) bag of words. With the Term to String node terms can be transformed into strings and with the reference row filter terms that have been tagged can be filtered out. At the end the found and tagged terms and the entries of the dictionary that have not been matched have to be concatenated.

 

Attached you find an example workflow. I hope this helps.

 

Cheers, Kilian

 

 

Hello Killian, 

thank you for your answer, it wass helpfull with modifications to my worklflow!