A new Just KNIME It! challenge is out! This time on text processing to gather insights from user reviews!
Let’s learn more about n-gram extraction and use this information to understand how users feel about “sound quality”.
Here is the challenge. Let’s use this thread to post our solutions to it, which should be uploaded to your public KNIME Hub spaces with tag JKISeason3-16.
Need help with tags? To add tag JKISeason3-16 to your workflow, go to the description panel in KNIME Analytics Platform, click the pencil to edit it, and you will see the option for adding tags right there. Let us know if you have any problems!
Hi all,
Here is my solution.
After extracting text from verified reviews using 4-grams, I extracted the words contained in that text. By color-coding them according to the ratings, it becomes easier to understand the insights.
My solution is similar to @sryu’s solution, although it may be of lesser quality.
I believe that through meticulous data cleaning, I was able to successfully eliminate several words that were not relevant to the trend.
Hi everyone,
After tagging each word in the 4-grams with the sentiment dictionary, I calculated whether each 4-gram was overall positive or negative. Apart from that, for each word within the documents, I tagged them using the sound quality dictionary, which was created by ChatGPT, and displayed them as single terms on a tag cloud. Thanks.
Hello Everyone,
Sharing my submission for the challenge.
“pleasantly surprised sound quality” was the top n-gram output.
I also tried to create a word cloud to subsequently understand the key words occurring along-side “sound quality” This also gave some interesting insights into the kind of words being used across all reviews. The top-10 ngrams and the word-cloud have been made available as a consolidated view in the “Word-Cloud Processing” component.
The phrase “sound quality” seems to have no special meaning, so it is just a simple answer to the question.
(1)What trends emerge from the 4-gram frequencies?
As the value of N increases from small to large, the document frequency of the obtained phrases becomes smaller and smaller.
(2)What is the top 4-gram?
Since the phrase to be checked is composed of two English words, the minimum value of N in N-gram analysis is 3. You can try N = 3, 4, 5… respectively. However, when N is greater than or equal to 5, the document frequency is all 1, indicating that this phrase only appears once in all documents, which is statistically insignificant. So the maximum value of N is 4.
Interestingly, the 4-gram analysis reveals that the most frequently used four-word phrases are positive. However, the word frequency analysis shows that “echo” is the most commonly used word in the reviews, significantly more than the second most used word, “music” (by over 60%). It would be worthwhile to investigate whether our product is causing echo and explore potential solutions to address this issue.
For me, it was an incredibly exciting challenge. You should definitely tap into your data analyst side!
As always on Tuesdays, here’s our solution to last week’s challenge!
A rather simple solution, right? The key really is to use the NGram Creator node after turning the reviews into Document type. We also found “pleasantly surprised sound quality” as the most popular 4-gram in the reviews.