Newbie issue: What is a document, and how to extract keywords from them

zippy_242 · August 4, 2016, 3:15pm

Hello, I have a workflow (wf.png) that crawls URLs and outputs a single column table of page content in XML (filtered-table.png).

How do I convert the XML cells of the above table to documents?
How do use the Keygraph keyword extractor?
What other basic things might I be missing?
There are many tutorials and example workflows - any in particular ones that reflect my issue?

Thank you

kilian.thiel · August 17, 2016, 5:48pm

Hi Zippy,

First extract the text that you want to use as document text with the XML Path node (or other xml processing nodes). Ideally you create a table with a title and a text column.
Convert this table (title and text) into documents using the Strings to Document node.
Filter the documents (optional)
Apply the Keygraph Keyword Extractor node to extract keywords

For example workflows you can browse the KNIME example server. You can connect to it as guest without any credentials. Browse the "Textprocessing" folder in the "Other data types" folder to see the available workflows of the text processing extension.

Cheers, Kilian

system · June 2, 2023, 9:48pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.