Topic detection? Clustering? Classification?

Hi, I want to use this program for my research but I am confused as to what kind of processing I should take. I would like to hear from you the different options available on KNIME.

So what I have is a list of paraphrased phrases for each of the original ideas. And what I want to do is to preprocess the phrases just to derive meaningful words. Then, I want to compare these meaningful words to the original idea to see how accurate the paraphrases were in capturing the original idea.

  1. So what process should I be taking?

  2. Is there a way to quantify the comparison between the meaningful words from the paraphrased phrases and the original idea? Like getting a count of overlapping words?

  3. If I were to have KNIME cluster the paraphrased ideas on its own and have it come up with keywords or topic for each cluster, what process would it be? Document classification or document clustering?

I am not sure if I am getting my ideas across to you. I hope I am making sense. Your suggestions and guidance will be very much appreciated.

Hello Sophiahn,

Welcome to the community! Could you give a specific example of a paraphrased phrase, the original idea and the result you would like to obtain?
On the KNIME hub as well as the KNIME Examples Server you find also a lot of useful examples under Examples/08_Other_Analytics_Types/01_Text_Processing/. For example, for Document clustering you could try out this workflow: https://hub.knime.com/knime/workflows/Examples/08_Other_Analytics_Types/01_Text_Processing/01_Document_clustering and for Classification this one: https://hub.knime.com/knime/workflows/Examples/08_Other_Analytics_Types/01_Text_Processing/02_Document_Classification

Best,
Jeany

1 Like

Hi Jeany,

Thanks for the reply. Here is an example to make my point clearer:

The original idea : Avoid tasks with sustained mental effort
And as for the paraphrased phrases, we have searched the literature for psychological tests that measure this concept such as …

Avoids, dislikes, or is reluctant to engage in tasks that require sustained mental effort (such as schoolwork or homework).
Dislikes doing things that require sustained mental effort
Avoids, dislikes, or is reluctant to take part in tasks that require sustained mental effort (such as schoolwork or homework)
When you have a task that requires a lot of thought, how often do you avoid or delay getting started?
My child avoids or dislikes tasks that require a lot of thinking.
My child avoids or dislikes tasks that require a lot of thinking.
Avoids activities that require sustained mental effort
Avoids, dislikes, or is reluctant to engage in tasks that require sustained mental effort
sustained mental effort
Often avoids tasks that require sustained mental effort
Dislikes doing things that require sustained mental effort

Sophia

Hello Sophia,

Our Examples Server/ Workflow Hub offers a lot of examples for your use case. I would explore the ones under 08_Other_Analytics_Types/01_Text_Processing/ and pick what suits your needs. In addition to the ones I already mentioned, you find here, for example, workflows that tag particular words in phrases (which you can then count afterwards) or extract topics from documents. You can mix and match according to your needs.

Have fun exploring,
Jeany