topic extraction elbow method

Hello,

I have a question regarding the topic extraction with elbow method (LDA) workflow. If my assumption is correct, this workflow automatically determines the optimal number of clusters. However, I tried different datasets and in each set, I got 7 number of topics.

I attached the scatter plot I got which is just straight and looks like something is wrong with it. I also attached the screenshot of the topic extractor node where it mentioned that number of topics is 7. What I want to do is I just want the workflow to determine the number of appropriate topics.

I would be glad if you can help me with that.

Thanks,
Begum

Hi @begumkaplan -

Something definitely looks wrong with your plot. If I use the default data that comes with the workflow, I see something like this:

In a skree plot like this we are looking for big jumps as a guess to the appropriate number of clusters. You can see there is a big jump between 6 and 7, and that’s why the value of 7 is being used as an input to the LDA node. It’s key to note there that the LDA node isn’t telling us there are seven clusters - we have to provide that information to it, based on the output of the loop.

To figure out what’s going on, we should look at how the workflow runs with your data, since something’s not coming out right during the processing. What are you using as input? Can you upload your version of the workflow?

Hello Scott,

Thank you very much for your reply. I attached the current workflow I am using as well as a sample of my data. I would be really glad if you can let me know what is wrong with my workflow. Thanks again!

Non-Profit_Topic_All_Mar11-25.xlsx (322.2 KB) topic extraction.knar (677.5 KB)

It looks like the sum of squares wasn’t being calculated because you were excluding the PCA dimensions in your k-means node. If you include those, you’ll see your plot change.

For the data you provided me, maybe 4 or 6 clusters is a good place to start?