Solutions to "Just KNIME It!" Challenge 20 - Season 2

:boom: New Wednesday, new Just KNIME It! challenge! :boom:

:hotel: This week we’re going to analyze hotel reviews and understand what they’re addressing (in a summarized fashion!) using Topic Modeling.

:beach_umbrella: Are reviews very different textually depending on their rating? What aspects of the guests’ experiences are uncovered in the reviews? :sunrise_over_mountains:

Here is the challenge. Let’s use this thread to post our solutions to it, which should be uploaded to your public KNIME Hub spaces with tag JKISeason2-20.

:sos: Need help with tags? To add tag JKISeason2-20 to your workflow, go to the description panel in KNIME Analytics Platform, click the pencil to edit it, and you will see the option for adding tags right there. :slight_smile: Let us know if you have any problems!

My solution !

4 Likes

Hi all,
This is my solution.

Word clouds by topic can be viewed in one place.
To facilitate easy viewing, identical terms are consistently colored across all topics.

The reviews with key terms highlighted can be checked.

10 Likes

Here’s my solution. Too large to upload in executed condition. Warning - workflow execution takes some time primarily due to perplexity calculations.





8 Likes

Hello everyone,

Here is my solution (JKISeason2-20 – KNIME Community Hub). I found it very helpful to use this component (Topic Scorer (Labs) – KNIME Community Hub) to define the optimal number of topics, I believe it is k=6.


I also created the component that allows to see what are the key words per topic and estimate the connections between topics and the rating. In my case I can see that topic_3 is usually connected to the lower rating texts, while topic_4 is mostly connected to higher rank texts.

I also made some investigation to see if rating values can be anyhow helpful. To do this I used Spacy Vectorizer for the original texts, then I used both PCA and t-SNE to reduce the dimensionality and this is what I have got:



It seems the texts with different ratings slightly diverge from each other, it is better to compare them pairwise (e.g. 5 vs 1, 4 vs 2, etc).

6 Likes

Hello everyone, this is my solution.

  1. Overview of data viewing
    One interesting thing is that high “Rating” comments account for almost 50%

  1. I also searched for the k value
    There is also an interesting thing, in the [2,10] interval, the evaluation score is monotonically decreasing. This means that from a numerical perspective, k=2 is the best.

  2. I only tried topic detection and visualization with k=2


Supplementary explanation:

This solution may not be perfect because there are too many “* _1” terms in the category “topic_1”, which is a duplicate of the terms in “topic_0”, making it impossible to distinguish between the two types.

I think the solution is to add preprocessing and delete the same ‘tokens’ to increase the difference between the two. But it timed out, haha.

Best wishes

2 Likes

My try on quiz with version 1 upload…naive to the topic so the flow.
https://hub.knime.com/-/spaces/-/latest/~hSPXUpvj6


pA10FCs/

3 Likes

Hello everyone,
Here’s my solution.

[single word]

[continuous two words]

I’ve learned @rfeigel’swork flow and tried implementing N-grams creator to analyze continuous two words. I also tried using Elbow method to find the optimal number of topics, referring to the following.

(Learn the Elbow Method to Optimize Topic Extraction | KNIME)

I’m not sure if I’m effectively applying the two new things I’ve learned. Thanks.

3 Likes

Hello Hello, my crazy KNIMErs!! :heart_hands:

I don’t want to be late, and although I don’t fully have the workflow since I want to do an article in my weekly newsletter (today I think I’ll leave it uploaded) I want to give you a preview:
gif20
In this visualization, you can see:

  • Rating (median)
  • Topic
  • 6 words with more weight in the topic
  • The name of the topic was done automatically with AI with the 6 words before
  • The colour of the visualization indicates the rating, red - bad. light green is good and green is super good.
  • % of each topic.

I hope to be able to upload the workflow and the article today.

See you later :yellow_heart: :chart_with_upwards_trend:!!!

6 Likes

:sparkles: As always on Tuesdays, here’s our solution to last week’s Just KNIME It! challenge :sparkles:

:mag: We used the perplexity metric to determine the number of topics for (1) good reviews, (2) bad and neutral reviews, and (3) the reviews as a whole. :cloud: After that, we used tag clouds to visualize the key terms per topic – it seems like positive reviews have a very “water oriented” topic, whereas negative reviews seem to focus a bit more on hotel facilities.

:fire: See you tomorrow for a new challenge! :fire:

2 Likes

Crazy KNINErs what was promised is debt, here is my solution JKISeason2-20 Ángel Molina – KNIME Community Hub

It blew up my mind :exploding_head: with the new nodes GPT4All – KNIME Community Hub

As always, I´m writing an article about it on my LinkedIn newsletter

4 Likes

Loving your newsletter!

1 Like

Hi, KNIMEr! :tada::tada::tada:

Just wanted to let you know that I added some notes about the key concepts related to this challenge. You can find them here.

By the way, I noticed that the official component Topic Explorer View doesn’t work when the “No of topics” is set to 2. You can verify the error in this workflow.

WARN  GroupBy              4:1803:0:1800:0:1452 Group column 'topic_2' not in spec.
WARN  GroupBy              4:1803:0:1800:0:1452 No aggregation column defined
WARN  GroupBy              4:1803:0:1800:0:1452 No aggregation column defined
WARN  GroupBy              4:1803:0:1800:0:1452 Group column 'topic_2' not in spec.

I’m not sure if it’s related to my KNIME version (4.7).

Best wishes,
HaveF

2 Likes

There is also this issue in 5.1, and I made the necessary modifications myself. Just unbind the component, find the “groupby” node inside the component where the error occurred, and delete the “topic_2” . I handled it this way on a temporary basis.

1 Like

Thank you, @tomljh , for providing the information. This is a verified official component that may require updates. :face_with_hand_over_mouth:

2 Likes

Thank you so much :smiling_face:

and here is the new article about this challenge: Ángel Molina Laguna on LinkedIn: El Poder de la Inteligencia Artificial para Analizar las Opiniones en las…

See you KNIMErs :heart_hands:

2 Likes

Hi Everyone :slight_smile:

I’m late to post again after a nice 4 day weekend :slight_smile:

This week I was inspired by this workflow on the hub to determine the best k value for the -Topic Extractor (Parallel LDA)- node. From looking at the Semantic Coherence and Perplexity, I also chose a k value of 2 topics to implement in the rest of the workflow.

I then created a -Tag Cloud- for the reviews overall:

Additionally, I created tag clouds per rating and displayed them using the -Tile View- widget:

Looking at the different tag clouds, “location” appears to be a commonly used term for the reviews with a mid to high rating, where as the term “night” seems to be popular in reviews with low ratings.

You can find my workflow on the hub here:

Best wishes
Heather

2 Likes

Thank you for reminding me.

Hi, can you provide data in excel, i can not use your workflow, because i can not use the data set. I m not sure why… :frowning:

Hello @Loknica07 , I don’t know what you mean exactly, the data is “table” type

The dataset is here alinebessa/Just KNIME It! Season 2 - Datasets – Challenge 20 - Dataset – KNIME Community Hub

Give feedback if you can´t run the workflow.

3 Likes