Measuring topic-coherence score in LDA Topic Modeling

Hi KNIMErs…
:pray: I would inquire about measuring topic-coherence score in Topic Modeling (i.e., Topic Extraction) using Knime Analytics Platform. Is there a ready node or component that accomplish this task in order to evaluate the topics extracted from LDA algorithm node.:pray:

The workflow that I’m attending to build is for the experimental part of my Master’s Thesis about “Mapping E-learning Research Themes and Trends from its existence to today; A Topic Modeling Based Review”.
Thanks Knime-community in advance :+1:

Hi @Salah_Online and welcome to the forum.

I don’t believe this is supported yet in KNIME. It looks like @AngusVeitch is thinking about adding coherence metrics to his TopicKR workflow available on the Hub here:

Maybe I am wrong though and someone else has developed something already? I can also ask internally to see if some of our developers might have more information - although as far as I know it hasn’t been discussed yet.

2 Likes

Hi @Salah_Online. Scott is right - I have been experimenting with topic coherence metrics, and some of those experiments are in an unreleased draft of that workflow, which I just haven’t had the time to revise recently. I will try to dig it up in the next few days to see what I can share.

3 Likes

Hi Scott & Thanks bro for the reply. Let’s keep in touch regarding this issue, for the benefit of the entire Knime’s community :+1:

Hi Angus & Thanks bro for the reply. Let’s keep in touch regarding this issue, for the benefit of the entire Knime’s community :+1:

Hi Angus…
The following link provides a solution for the topic coherence measure using Jupiter-Python code, that measures the topic-coherence value in order to evaluate the extracted topics using LDA algorithm.
I assembled the code-cells into a single file attached with this reply.

https://towardsdatascience.com/end-to-end-topic-modeling-in-python-latent-dirichlet-allocation-lda-35ce4ed6b3e0

Therefore, is it possible to use the same method-steps in KNIME so the coding-cells can be transformed into KNIME’s Nodes & Components, for better use by normal researchers who don’t have to be skilled-coders in order to conduct their own topic models.

Jupiter-Python File:

Looking forward your suggestions
Thanks bro in advance :wave:

Hi Scott…
The following link provides a solution for the topic coherence measure using Jupiter-Python code, that measures the topic-coherence value in order to evaluate the extracted topics using LDA algorithm.
I assembled the code-cells into a single file attached with this reply.

https://towardsdatascience.com/end-to-end-topic-modeling-in-python-latent-dirichlet-allocation-lda-35ce4ed6b3e0

Therefore, is it possible to use the same method-steps in KNIME so the coding-cells can be transformed into KNIME’s Nodes & Components, for better use by normal researchers who don’t have to be skilled-coders in order to conduct their own topic models.

Jupiter-Python File:

Looking forward your suggestions
Thanks bro in advance :wave:

Hello @Salah_Online,

you don’t need to reply separately to each user in topic (especially if it’s same reply). If you want to draw someone’s attention to specific topic/reply simply tag them like I tagged you (@user_name)

Br,
Ivan

1 Like

Thank you, this would be really useful. It would be an excellent upgrade for the topic models functionality in Knime. It is missing a good way to evaluate topic models. Another suggestion (I can give more details on a separate post) is expanding with other topic model alternatives that are available in R such as structural topic models (STM), correlated topic models (CTM), traditional LDA and VEM. It would be a big help for researchers using LDA.

1 Like

Hi @Salah_Online. I just updated my TopicKR workflow on the KNIME Hub. It now includes components that calculate topic coherence via normalised pointwise mutual information (NPMI) and via the conditional probability of successive topic terms. The latter approach is described in a paper by Mimno et al. I can’t remember right now where the NPMI method is described, but I think I have seen it mentioned in a few places. It makes intuitive sense, and in the few tests that I have done, I think it produced better results than the conditional probability method. (Note also that I did some experiments to normalise the conditional method, but I am not sure if they are mathematically sound!)

More generally, I have not had anyone review the calculations in the wokflow, so I cannot guarantee that they are free from error.

The workflow also provides components that should assist the job of generating several topic models across different parameters. I have half a blog post written to explain the workflow in detail, but never got around to finishing it, on account of becoming employed! One day in the near future, I hope to finish that post. In the meantime, please feel free to correspond with me if you want to test the workflow, so we can iron out some of the bugs that are sure to be in there. (You can email me directly via the contact details on my blog.) I’d love to see KNIME used more in the topic modelling and social science space, and this could be a chance to make this happen!

I haven’t had time to review the Python notebook yet, and I barely know any Python. But I’ll be interested to know how its coherence metric compares to the ones that I have used.

Best of luck, and stay in touch!

1 Like