Solutions to "Just KNIME It!" Challenge 20 - Season 2

alinebessa · August 9, 2023, 2:14pm

New Wednesday, new Just KNIME It! challenge!

This week we’re going to analyze hotel reviews and understand what they’re addressing (in a summarized fashion!) using Topic Modeling.

Are reviews very different textually depending on their rating? What aspects of the guests’ experiences are uncovered in the reviews?

Here is the challenge. Let’s use this thread to post our solutions to it, which should be uploaded to your public KNIME Hub spaces with tag JKISeason2-20.

Need help with tags? To add tag JKISeason2-20 to your workflow, go to the description panel in KNIME Analytics Platform, click the pencil to edit it, and you will see the option for adding tags right there. Let us know if you have any problems!

corgikenhouse · August 10, 2023, 12:35pm

My solution !

sryu · August 11, 2023, 2:27pm

Hi all,
This is my solution.

Word clouds by topic can be viewed in one place.
To facilitate easy viewing, identical terms are consistently colored across all topics.

The reviews with key terms highlighted can be checked.

rfeigel · August 12, 2023, 5:23pm

Here’s my solution. Too large to upload in executed condition. Warning - workflow execution takes some time primarily due to perplexity calculations.

Artem · August 13, 2023, 12:53pm

Hello everyone,

Here is my solution (JKISeason2-20 – KNIME Community Hub). I found it very helpful to use this component (Topic Scorer (Labs) – KNIME Community Hub) to define the optimal number of topics, I believe it is k=6.

I also created the component that allows to see what are the key words per topic and estimate the connections between topics and the rating. In my case I can see that topic_3 is usually connected to the lower rating texts, while topic_4 is mostly connected to higher rank texts.

I also made some investigation to see if rating values can be anyhow helpful. To do this I used Spacy Vectorizer for the original texts, then I used both PCA and t-SNE to reduce the dimensionality and this is what I have got:

It seems the texts with different ratings slightly diverge from each other, it is better to compare them pairwise (e.g. 5 vs 1, 4 vs 2, etc).

tomljh · August 15, 2023, 4:57am

Hello everyone, this is my solution.

Overview of data viewing
One interesting thing is that high “Rating” comments account for almost 50%

I also searched for the k value
There is also an interesting thing, in the [2,10] interval, the evaluation score is monotonically decreasing. This means that from a numerical perspective, k=2 is the best.

021800×895 86.2 KB
I only tried topic detection and visualization with k=2

031169×740 187 KB

041143×821 157 KB

Supplementary explanation:

This solution may not be perfect because there are too many “* _1” terms in the category “topic_1”, which is a duplicate of the terms in “topic_0”, making it impossible to distinguish between the two types.

I think the solution is to add preprocessing and delete the same ‘tokens’ to increase the difference between the two. But it timed out, haha.

Best wishes

AnilKS · August 15, 2023, 5:22am

My try on quiz with version 1 upload…naive to the topic so the flow.
https://hub.knime.com/-/spaces/-/latest/~hSPXUpvj6

pA10FCs/

tark · August 15, 2023, 5:33am

Hello everyone,
Here’s my solution.

[single word]

[continuous two words]

I’ve learned @rfeigel’ｓwork flow and tried implementing N-grams creator to analyze continuous two words. I also tried using Elbow method to find the optimal number of topics, referring to the following.

(Learn the Elbow Method to Optimize Topic Extraction | KNIME)

I’m not sure if I’m effectively applying the two new things I’ve learned. Thanks.

MoLa_Data · August 15, 2023, 10:35am

Hello Hello, my crazy KNIMErs!!

I don’t want to be late, and although I don’t fully have the workflow since I want to do an article in my weekly newsletter (today I think I’ll leave it uploaded) I want to give you a preview:
gif20
In this visualization, you can see:

Rating (median)
Topic
6 words with more weight in the topic
The name of the topic was done automatically with AI with the 6 words before
The colour of the visualization indicates the rating, red - bad. light green is good and green is super good.
% of each topic.

I hope to be able to upload the workflow and the article today.

See you later !!!

alinebessa · August 15, 2023, 2:04pm

As always on Tuesdays, here’s our solution to last week’s Just KNIME It! challenge

We used the perplexity metric to determine the number of topics for (1) good reviews, (2) bad and neutral reviews, and (3) the reviews as a whole. After that, we used tag clouds to visualize the key terms per topic – it seems like positive reviews have a very “water oriented” topic, whereas negative reviews seem to focus a bit more on hotel facilities.

See you tomorrow for a new challenge!

MoLa_Data · August 15, 2023, 4:09pm

Crazy KNINErs what was promised is debt, here is my solution JKISeason2-20 Ángel Molina – KNIME Community Hub

It blew up my mind with the new nodes GPT4All – KNIME Community Hub

As always, I´m writing an article about it on my LinkedIn newsletter

alinebessa · August 15, 2023, 6:06pm

Loving your newsletter!

HaveF · August 16, 2023, 1:27am

Hi, KNIMEr!

Just wanted to let you know that I added some notes about the key concepts related to this challenge. You can find them here.

By the way, I noticed that the official component Topic Explorer View doesn’t work when the “No of topics” is set to 2. You can verify the error in this workflow.

WARN  GroupBy              4:1803:0:1800:0:1452 Group column 'topic_2' not in spec.
WARN  GroupBy              4:1803:0:1800:0:1452 No aggregation column defined
WARN  GroupBy              4:1803:0:1800:0:1452 No aggregation column defined
WARN  GroupBy              4:1803:0:1800:0:1452 Group column 'topic_2' not in spec.

I’m not sure if it’s related to my KNIME version (4.7).

Best wishes,
HaveF

tomljh · August 16, 2023, 1:47am

There is also this issue in 5.1, and I made the necessary modifications myself. Just unbind the component, find the “groupby” node inside the component where the error occurred, and delete the “topic_2” . I handled it this way on a temporary basis.

HaveF · August 16, 2023, 2:37am

Thank you, @tomljh , for providing the information. This is a verified official component that may require updates.

MoLa_Data · August 16, 2023, 5:35am

Thank you so much

and here is the new article about this challenge: Ángel Molina Laguna on LinkedIn: El Poder de la Inteligencia Artificial para Analizar las Opiniones en las…

See you KNIMErs

HeatherPikairos · August 16, 2023, 10:06am

Hi Everyone

I’m late to post again after a nice 4 day weekend

This week I was inspired by this workflow on the hub to determine the best k value for the -Topic Extractor (Parallel LDA)- node. From looking at the Semantic Coherence and Perplexity, I also chose a k value of 2 topics to implement in the rest of the workflow.

I then created a -Tag Cloud- for the reviews overall:

Additionally, I created tag clouds per rating and displayed them using the -Tile View- widget:

Looking at the different tag clouds, “location” appears to be a commonly used term for the reviews with a mid to high rating, where as the term “night” seems to be popular in reviews with low ratings.

You can find my workflow on the hub here:

Best wishes
Heather

tomljh · August 16, 2023, 10:48am

Thank you for reminding me.

Loknica07 · August 16, 2023, 6:13pm

Hi, can you provide data in excel, i can not use your workflow, because i can not use the data set. I m not sure why…

MoLa_Data · August 16, 2023, 7:19pm

Hello @Loknica07 , I don’t know what you mean exactly, the data is “table” type

The dataset is here alinebessa/Just KNIME It! Season 2 - Datasets – Challenge 20 - Dataset – KNIME Community Hub

Give feedback if you can´t run the workflow.