Queries on locating keywords in different sections in a document

Hi all

May I have your support of the following:

I have several hundred of documents with hundred of pages each. As some keywords are located in different paragraphs with different meaning, which can be distinguished by different paragraphs under different section title. I want to apply BoW by different sections. Given the files are in Text format, my questions are:

  1. How to locate the position of different sections by locating the section title (pls note that different no of paragraphs within a section for different doc)

  2. As some of the documents may not have section titles, what should I do for these document?


Hi Lawson,

detecting different sections in a document is not possible withe the existing parser nodes. The text will all be treated as one section plus the title. To split a document up you need to split up the file into one file per section.

I hope this helps.

Cheers, Kilian