Topic Extractor and Keyword Extractor


1) When I run the Topic Extractor (Parallel LDA) node, it is not always providing iteration statistics. Is there a limitation to this node that I have missed or is there anything I could be doing wrong during pre-processing?


2) In using the Topic Extractor (Parallel LDA) node, Is it appropriate to use log likelihood to determine the number of words per topic as it is to determine the number of topics to extract?


3) Is there a recommended rule of thumb to determine the number of keywords to extract using the Keygraph keyword extractor node?


Many thanks,


Hi Ben,

1.) the node should create iteration statistics for every iteration. Do you have an example workflow (executed, exported with data) with an Topic Extractor that does not provide the iteration statistics? That would be useful to reproduce the problem.

2.) the second output table of the node contains the topic ids, terms, and a weight that represents how much a term is related with that topic. Based on this weight (in comparison to the maximum weight) it can be choosen how many terms will be considered representing a topic. I don't know if log likelihood is appropriate here.

3.) I don't know of any rule of thumb for the Keygraph keyword extractor.

Cheers, Kilian