Measuring Topic-coherence score & optimal number of topics in LDA Topic Modeling

Hi KNIMErs… :wave:
I would inquire about how to use KNIME to:

  1. Measure (estimate) the optimal (best) number of topics when performing LDA topic modeling (i.e., Topic Extraction), for a large set of text documents (.CSV dataset), using KNIME’s LDA node :interrobang:.

  2. Measuring topic-coherence score in LDA Topic Model in order to evaluate the quality of the extracted topics and their correlation relationships (if any) for extracting useful information :interrobang:.

Is there a simple way (e.g, ready node or a component) that can accomplish this task :interrobang:.

The workflow I’m attending to build will be beneficial for academic researchers, who do not necessarily have to be professional coders in R or Python, which are commonly used for such purposes. The topic model is normally used for analyzing large sets of texts corpus, such as scientific articles, in order to discover their trend topics (themes):white_check_mark:

The following link provides the traditional solution for calculating the topic coherence score using Jupiter-Python as pre-explained✅

https://towardsdatascience.com/end-to-end-topic-modeling-in-python-latent-dirichlet-allocation-lda-35ce4ed6b3e0

I assembled the code-cells into a single file attached below:

Jupiter-Python File:

Looking forward to your suggestions
Thanks, Knime-community in advance :+1:

Take a look at example

and component

and elbow method for topic extraction

2 Likes