I'm absolutely new to Knime and I' desperately trying to get out some information from patent pdf files downloaded from the DPMA homepage. The PDF parser works and generates a list of documents withhte first column being the row number and the second colum the path to th document(s). But up to now, I wasn't able to extract any informations from the document text itself (for example BoW). The document view node als works but displays only the information mentioned above... Could anybody give me a hint how confugure the PDF parser properly and how to proceed from its outlet ?
thank you very much for your soon reply. The document was very helpful and now I have an Impression of the working steps to do. I could put all the nodes in operation and they worked. Nevertheless, in the end, again only the "text" of thee path to the directory and the name of the document files were processed, not the content of the documents themselves.
What is processed is for example
C:\Data\Documents\Patents\DE 2954309.pdf.
This appears in the document table and is further processed by the succeeding nodes, but the information in the document is not accessed...
I' m sure that I'm doing something wrong on an elementary level, but I don' know what it is :-(