Category Extraction

In chapter 4 example 1 of “from words to wisdom”, an ICF node is used. How to know the document categories and the number of them? How to know the category of the document that document belongs to in spite of it is not determined in during strings to document node?

Hi @ahmed_gomaa -

In this case, the documents are assigned categories using the Strings to Documents node earlier in the workflow. In particular, since this corpus consists of posts in the KNIME forum, the categories assigned are the title for each thread included in the corpus.

If you want to see what the categories actually are, you can use the Document Viewer node. You can also use the Document Data Extractor node to pull out the categories, and a subsequent GroupBy node to count how many documents are associated with each topic.

1 Like

So, the Strings to Documents node assign the title as a default category. If I apply topic detection scenario, is the probably category be detected and be overwritten?

If you do some topic detection task and would like to apply these topics as categories later in the workflow, you can use the Document Data Assigner node for this purpose. Categories won’t be updated automatically - you need to explicitly set them.

1 Like

Thanks. I will try it.