Newbie issue: What is a document, and how to extract keywords from them

Hello, I have a workflow (wf.png) that crawls URLs and outputs a single column table of page content in XML (filtered-table.png).

  1. How do I convert the XML cells of the above table to documents? 
  2. How do use the Keygraph keyword extractor?
  3. What other basic things might I be missing?
  4. There are many tutorials and example workflows - any in particular ones that reflect my issue?

 

Thank you

 

 

 

Hi Zippy,

  1. First extract the text that you want to use as document text with the XML Path node (or other xml processing nodes). Ideally you create a table with a title and a text column.
  2. Convert this table (title and text) into documents using the Strings to Document node.
  3. Filter the documents (optional)
  4. Apply the Keygraph Keyword Extractor node to extract keywords

For example workflows you can browse the KNIME example server. You can connect to it as guest without any credentials. Browse the "Textprocessing" folder in the "Other data types" folder to see the available workflows of the text processing extension.

Cheers, Kilian