Topic Extraction - How to have better results?

gustavo.velho · November 1, 2016, 1:10pm

Hey there,

Just looking for sharing some ideas around Topic Extraction. I've been testing this node, but I'm not really satisified with results, so I'm looking for some tips.

When I look the documents that are assigned a specific topic (and here I'm already filtering documents that have more that 0.9 as weight for a specific topic), it still don't look optimized. Right now I'm working with 4 words per topic, and here's an example:

word1, word2, word3, word4

I would expect that the 4 words would appear in the document, but there are lots of documents only showing word1 or even word2.

Any tips for better results? More words per topic? Change the Alpha and Beta variables?

Thanks! :)

Gustavo Velho

RolandBurger · November 29, 2016, 4:07pm

Hi Gustavo,

The documents are assigned to a topic based on their similarity to other documents. This means that in your case, not every document has to include each of the four words that were extracted for each topic.

By increasing the words per topic, you will have a greater chance that the extracted words appear in a given document.

I hope that helps!

Best,

Roland

Oscar · December 9, 2016, 6:10pm

You may want to try this tool: http://elcid.demon.nl/form.html it organizes a text into a tree of topic/subtopics plus an automatic summary.

Geo · December 11, 2016, 1:12am

How clean is your data and how did you preprocess it? That’s as important than the algorithm parameters.

kilian.thiel · December 14, 2016, 9:40am

Hi Gustavo,

if you want your topics represented by terms that occur in almost all documents of one topic you can filter the documents before applying topic extraction.

1. standard preprocessing + topic extraction to find groups of documents that belong to the same topic

2. Loop over each group (=documents belonging to one topic) and count frequencies, than filter based on freqs.

3. Concatenate filtered docs again and apply again topic extraction

Cheers, Kilian

gustavo.velho · December 23, 2016, 12:12pm

Thanks guys! I'll try your suggestions and see results. Appreciate your help!

Gustavo

system · June 2, 2023, 9:47pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.