Keyword Weightage

Hi ,

I am very new to KNIME and especially in the text processing space. I have mutliple documents and from this document i filtered out (subset) of documents based on a keyword like "ABC" . After this i segmented this filtered document into sentences using "Sentence extractor". Now what i want to filter those sentence which has more weightage/context towards  "ABC" among others sentence.


Document Sentence Sentence No
Doc 1 love yours new product ABC.  1
Doc 1 I really admired your new product ABC which has more option as before 2

What i want from above to bring only sentence 2 which has more context for product ABC than 1.



Hi Oraware,

you could see this as a classification problem with class labels from very negative to very positive. The weighting of every sentence could be done e.g. based on the assigned class. Of course you need some training data to build the model. Alternatively you could do a dictionary approach and have a dictionary with words that indicate affection to something e.g. "ABC" and count how many of these words occur in the sentences.

Cheers, Kilian


Hi Kilan,

Thanks for it , i want to use dictionary approach. How can i create dictionary for such words ? Probably by using existing documents ? 

You can simply write a txt file with one word below the other manually. To extract words from documents and use them you need to have a set of documents. Preprocess and filter them as you like than create a bow, apply grouping on the terms to get a list of unique terms and convert them into strings. This string col can be used as dictionary.

Cheers, Kilian

Thanks Kilian,

is there any sample workflow which covers the weightage of keywords in different setence?

Please find an example workflow attached.

Cheers, Kilian

Thanks Kilian :)


Hi Killan,

I saw your provided workflow and found +ve and -ve tag already in raw data. I dont have tag with my raw data. In this scenerio how can i create dictionary as +ve and -ve based on raw document only without tag (+ve/-ve)?



The workflow shows how to classify documents based on dictionaries of words.

To create dictionaries from documents you need to extract term from specific documents. Filter those documents that you know you want to extract terms from. Then maybe apply POS tagging to extract only adjectives or nounds, filter stop words etc. To extract important terms you could also try the Topic Extractor and Keyword Extractor nodes.

Cheers, Kilian

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.