How to count specific tagged terms within senence

Hi KNIME Team,

Is it possible to count how many terms of a specific tag has occurred within a sentence of a Document? For example, if we want to find how many terms of tag NN(POS) co-occurred within a sentence in a document.

Thanks,

Sudha

Hi Sudha,

this is possible with a little detour. You can use the Sentence Extractor, to extract sentences from documents. Convert these sentences back to documents and apply the tagging. Than create a bow, count the term frequenc (TF node), and convert the Tags to string ("Tags to string" node). Finally group over the tag (as string) (GroupBy node) and sum up the TF values.

Cheers, Kilian

Hi Kilian,

Thank you for the immediate reply. Is there any way to parse PDF documents by section? I read this thread in the forum https://tech.knime.org/forum/knime-textprocessing/parsing-sections-of-pdf-file-separately, but it looks like we have to know the section names as per the example provided, which may not be known in the current problem I m working on. There are hundreds of PDFs to be analyzed and it would be great if it is possible to extract sections and look for terms of interest winthin them.

Thanks,

Sudha

Hi Sudha,

the PDF Parser node extracts the complete text from PDFs. Sections can not be extracted. You could extract the text as string using the Document Data Extractor and extract substrings form that. However, you have to know which substrings to extracts beforhand.

Cheers, Kilian

 

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.