Solutions to "Just KNIME It!" Challenge 06 - Season 2

AnilKS · May 7, 2023, 4:39pm

MY take on the challenge 6 … The dataset itself took a toll on the problem … Tried the approach to classify in Binary instead of multinomial … and the xgboost gives good results in binary… certainly not the good way … to discard the given option.
Didnt try much on text preprocessing and enrichment.
xg multi
xg binary

Artem · May 7, 2023, 5:37pm

Here is my solution:

It was quite interesting to experiment with different methods (BERT, Spacy) and I trained a weaker model based on BERT embedings. As a bonus I decided to include conformal prediction to analyze model prediction certainty.

HaveF · May 8, 2023, 3:13am

Hi, KNIMEr

Here is mine:

Just use LLMs

I also did some error analysis if you are interested in reading

alinebessa · May 8, 2023, 2:40pm

Very interesting approaches to this problem!! The conformal prediction and error analysis parts (thanks @Artem and @HaveF) make me wonder if you folks would be interested in challenges touching on XAI. Let us know!

Artem · May 8, 2023, 2:44pm

@alinebessa Sure, XAI seems to be the hot topic nowadays.

gonhaddock · May 8, 2023, 8:00pm

Hello KNIMErs

Here is my take to the JKI S02 CH06. I had no time to code by myself, but found some interesting Py blogs about how to do it; and learning many concepts about Sentiment Analysis as well.

What I’ve worked out from scratch, it has been the pre-process, quite supported by regex. I hadn’t time either to play with the parameters, but I got a considerable accuracy at my first approach, so I’m happy with that.

The modelling part has been ‘inspired by’ Sentiment Analysis (Classification) of Documents – KNIME Community Hub

BR

HaveF · May 9, 2023, 1:27am

@alinebessa Of course! Btw, guided analytics is also an interesting topic!

alinebessa · May 9, 2023, 12:36pm

As always on Tuesdays, here’s the solution to last week’s challenge on sentiment analysis !

Naturally, our solution relies on the components that were created for it – but we use an additional one for AutoML purposes! We did not optimize the solution very much to increase the accuracy or F-measure, but this would be a great step in order to make this solution more relevant for negative reviews (which are the ones that “matter” the most).

You folks once again really impressed us with your solutions. Very creative! We hope to see you tomorrow for a new challenge.

chuvak56 · May 9, 2023, 3:35pm

Out of competition, but I’d like to share my method of automatization of learning model with different parameters to choose best and compare efficiency of different models using Box Plot and Loops
knime://My-KNIME-Hub/Users/chuvak56/Public/Challenge%2006

HeatherPikairos · May 9, 2023, 4:05pm

Hi Everyone

A little bit late with this one but here is my solution for challenge 6:

The workflow uses the -Enrichment and Preprocessing- component, followed by the -Document Vectorization- component in order to prepare the data for training. An -X-Partitioner- node is used to start a cross validation loop and inside the loop are the -Logisitic Regression Learner- and the -Logistic Regression Predictor- nodes using “Class” as the target column and “neutral” as the reference category. The -Loop End- node concatenates the results from each iteration.

The -Scorer- calculates the accuracy at around 75%, which is acceptable for something that is not crucial and for the simplicity of the workflow used to create the model. However, it is also important to assess the model’s performance. In this solution there is high specificity for all 3 classes and reasonable to high sensitivity for the positive and negative classes, where as the neutral class has a sensitivity around 0.5 and is therefore pretty much random.