Tag sentences by their purpose

Dear community,

I am trying to analyze docs beyond keywords by looking at the sentence level, so that I can tag e.g. all words of a sentence with it and better compare docs. My thought is to look at the verbs first:

E.g. “The sky is blue” describes a state indicated by the use of “is”. The sentence “After looking at past summers, we find that the sky is mostly blue in summer” describes a (contextual) result as indicated by “find” and includes a process by “after looking”.
My hope is that there are already verb libraries I can use. I have not yet found anything good.
The grammatical approach (e.g. https://en.wikipedia.org/wiki/Verb) may be very difficult to exploit taking a linguistic point of view and not a content one.

Any ideas on verb libs or would you approach the question of extracting semantic meaning on sentence level differently?

Thanks in advance for the discussion.
Frank

Hi @FH20

Welcome to the KNIME Community!

What you can do, once your documents are loaded in KNIME, is tagging. There is a Tagger node called POS tagger that can tag the following parts of a sentence:
https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

If you open the link to the POS Tagger you will get to the Hub where you also can see some example workflows that show how the node can be used and how the data has been prepare and preprocessed.

Best,
Martyna

Hi Martyna,

thanks for the answer.

The POS tagger is not sufficient here. It does mark verbs only by their time forms. What I am talking about is the next level - categorizing the sentences by their general purpose such as giving a statement, suggesting a certain approach, or presenting a conclusion.
Think e.g. about a discussion - there is a (common and sometime unmentioned) state, 1+ hypothesis like “the weather will be fine today”, 1+ arguments, and 1+ conclusions.
When we are analyzing written text on forums, in articles etc., it would be enormously helpful for deeper analysis to differentiate the sentences by their purpose.
I can imagine a convoluted and very burdensome manual approach using domain-specific dictionaries for a specific problem, but maybe there is a better and smarter way.

Please feel free to drop your ideas there.
best regards
Frank