Topic Extractor: topic proportions not per terms but documents


I am trying to extract the topic proportions for articles by using the topic extractor node. After preprocessing the articles, the topic proportions output is not presented at the level of articles but terms.

I accumulated the proportions by using the "group by" node; however, my summarized topics put together articles that are not quite related. Another attempt was to run the topic extractor node before preprocessing, and I got a better representation of the topic proportions at the level of articles. Unfortunately, that output is including terms that do not contain content, so that all of them are assigned into one particular topic. (My workflow is attached)

I wonder whether you could suggest a way to obtain the topic proportions at the level of documents after preprocessed.

Thank you,


Hi Julio,

are you applying the Topic Extractor node on a bag of words?

You can simply apply all preprocessing node on the document list (like tagger nodes). This is direct preprocessing and possible since 2.9. Then after preprocessing use the Topic Extractor. Each row contains a document and an assigned topic. Then simply use a e.g. Pie Chart to visualize the proportions of assigned topics.

Attached you find a workflow in which documents are POS tagged, preprocessed, and two topics are assigned. Finally the proportions of these two topics are visualized in a pie chart.

Cheers, Kilian

Hi Kilian,

Yes, I was applying the topic extractor node on a BoW. I thought that I need a BoW before preprocessing the data. That assumption was my problem. 

With your suggestion and the workflow provided, I fixed my model, and it is working as I wanted!

Thank you for your kind support,