Hello! I’m completely new to KNIME. I’m a middle school principal and I’ve given a survey to 480 students. The survey includes a few open-response questions in which students shared answers to questions such as, “If you had one wish for next year’s schedule, what would you wish for?”.
480 responses is a lot to digest, and I’m wondering if KNIME has a node that could somehow analyze the responses to the question to tease out the most frequent themes. I’ve successfully connected the Excel Reader node to the Column Filter node and selected the two applicable columns (Grade Level and Wish Question).
Is there a node that would help me digest these open response answers? Thanks!
@delawrence welcome to the KNIME forum. Knime has a lot of text analytic capabilities. Here is a collection of workflows where you might want to take a look for Topic Detection
Also here is an overview of use cases and links to a webinar replay:
Thank you so much, @mlauber71! I so appreciate you taking the time to point me in the right direction.
I followed the link for Topic Detection and successfully (I think!) got through Document Preprocessing. That creates a table with the following columns:
Current Grade (examples: 6th grade, 7th grade, 8 grade)
Open Response 1 (“Wish” question) - students entered a paragraph of text responding to the prompt
Document (seems to be Open Response 1 with quotation marks around each block of text
Automatically Preprocessed Document (seems to pull a distinctive phrase from Document?)
Term (A single string of characters that seems to be pulled from Automatically Preprocessed Document?)
I then connected that to Topic Extractor (Parallel LDA) node, and set Document as the Document column in the configuration menu. I set the number of topics to the default 10, and executed it. Here is what was output by the node:
Topic Extractor Output.xlsx (109.1 KB)
I see columns for ten “topics” but have no idea what each topic refers to. Can anyone guide me on next steps? I feel like I’m so close, but so far from a usable result!
@delawrence these things come to my mind, although I am not a text mining specialist.
- you should check the preprocessing since it is essential for the functioning
- a document is a structured container processing the text for a model
- check if you have words that would not carry an additional meaning and it would be best to remove them
- there is no guarantee that 10 topics are the correct number. You may have to do some testing to see if you are satisfied with the result
- which means you check the words the model would use to describe the topic
- you would name the topic based on your information
- if you expect very diverse answers from groups (grades in your case) you know in advance that maybe do separate models and then one model with fewer topics for ‘overall’ wishes - what they all want. Perhaps just try a word cloud for that
Are the answers in your file the ones from the survey? You might want to check if there is any information in the file that you are not comfortable sharing (like names).
Maybe someone with more experience might also weigh in. I will have to see if I can try something.
Topic Models from Review
Topic Detection Analysis Training
Thank you, @mlauber71! There aren’t any identifying details in the spreadsheet - thanks for asking. I think my misunderstanding was that I expected KNIME to actually come up with natural language topic descriptions on their own. I guess it will require more work on my part than I anticipated.
@delawrence one idea could be to paste the answers into ChatGPT or just the ones identified as one topic and then ask it to give you a summary. Or ask it directly to summarize the comments.
KNIME has an example about an API connection so you might build some test runs with several numbers of clusters.
@delawrence one idea could indeed be to identify some overall topics and then let ChatGPT summarise them. The current detected topics might not differentiate enough. You could also try to automate the process if you want to try several clusterings (Knime + ChatGPT - #4 by roberto_cadili).
This is what it says about Topic 3:
Based on the statements provided, it seems like the most common topics that emerge are:
- More breaks/socializing opportunities: Many students expressed a desire for more time to socialize and take breaks throughout the day, such as longer lunch breaks, snack breaks, recess, and flex blocks. Some students also mentioned the importance of having time to relax and run outside.
- Improved schedule structure: Students also mentioned various ways they would like the schedule to be structured differently, such as having a more balanced workload, having shorter classes or fewer classes per day, having a separate time for homework, or having a flexible block that never drops.
- Access to phones: Several students mentioned wanting more access to their phones during the day, either during designated break times or throughout the day.
- Other miscellaneous wishes: A few students mentioned more artistic opportunities, mixing up the teams more, and only having one English class.
Overall, it seems like students are hoping for a more flexible and balanced schedule that allows for breaks, socializing, and access to phones while still maintaining an academic focus.
Fantastic! Initially, I had tried dumping all the answers into ChatGPT, but it balked at the number. It hadn’t occurred to me to break them up by topic using KNIME. Genius! I think I may run KNIME again with a greater number of topics to see if that makes for more logical topic categories.
@delawrence you might want to try fewer topics and also check the data preparation again. Another option would be to sample the answers and feed them into ChatGPT and ask it to summarise. And then feed the answers again or tell it to remember the answers within one chat and ask for a summary of all entries or a summary of summaries. Or again try a summary by grades.