Solutions to “Just KNIME It!” Challenge 29 - Season 4

:sun_with_face: Happy Wednesday, folks! :sun_with_face:

:turkey: After our Thanksgiving break, we’re back with another Just KNIME It! challenge on data wrangling and cleaning. :broom: :gear: Get ready to dive deep in missing value imputation and data type reformatting, even counting on LLMs to help along the way. :brain: This challenge errs towards the hard side, and we hope you learn a lot by solving it!

Here is the challenge. Let’s use this thread to post our solutions to it, which should be uploaded to your public KNIME Hub spaces with tag JKISeason4-29 .

:sos: Need help with tags? To add tag JKISeason4-29 to your workflow, go to the description panel in KNIME Analytics Platform, click the pencil to edit it, and you will see the option for adding tags right there. :blush: Let us know if you have any problems!

Find herewith my submission : JKISeason4-29 – KNIME Community Hub

Very well elaborated while links and data seems not linked so got bit puzzled .. Link to submission took to 2/4 .Hope the brief ask is captured in solution as read.

Reclassified Table of Top -3 ( as chosen) under representative category is reclassified as below.

2 Likes

Link for solution is redirecting to Solutions to “Just KNIME It!” Challenge 2 - Season 4 ?

Hi, Anil! When I clicked on your solution link I went to the right place. What link is redirecting to this forum entry?

Thanks!

Huhh, this was quite a challenge. I did use nodes that I didn’t use in a long time. I really really enjoyed it. I said in this season couple of times now but I think I loved this challenge the most :smiley:

My solution to the challenge:

My workflow (as it can be seen it is quite complex :smiley: ):

image

My approach:

  • Loading the two Excel sheets and preprocess it

    • In the details: Convert the necessary columns to number, extract the ASIN, dropping the missing prices, Handling the two missing brand

    • In the reviews: Convertthe necessary columns to number, removing extreme lengthy comments (I wouldn’t do that if the task didn’t ask for it :smiley: ), split the date and location, extract the year and month and finally join with the details

  • I have visualized the price by category

  • I have removed the outliers regarding price in every category (group loop and numeric outliers nodes)

  • Created a composite metric that ranks the products

    • After normalization I used the product rating and global rating count 75% and the price for 25% to identify which product is the best by this composite metric
    • As there were too much I filtered for the TOP50 product
  • I identified underrepresented categories

    • Regarding the global rating and the product counts by category
    • I filtered where there is less than 200 products and less than 5000 ratings globally
    • In this way I just have two “underrepresented” categories: Accessories, Gaming Mice
  • After that I connected to my local Falcon LLM and asked what other category (from the remaining categories) should the product in these categories belong to

    • The LLM was not too correct as for every product it responded “Gaming Keyboards” which is not correct in my opinion in the case of the gaming mice (it should be mice) but I just accepted it now :smiley: (maybe with just more precise prompt engineering or with not a local model it could be enhanced)
    • After the response I have replaced the category for the products in the main table
  • I compared the two “categories”. There is not much of a different, but the Gaming Keyboards category got bigger

  • And finally I just wrote the Excel out (now to my desktop)

This was one of the biggest challenges of the season and as I said I think it is my favoritue (TOP3 that’s for sure :smiley: ). I loved how it forced me to use long forgotten nodes :slight_smile:

Love (sadly soon: Loved) this season! :slight_smile:

4 Likes

I think Anil is mentoining this button:

For me it goes to here as well:

1 Like

Huh, this is awkward – it’s coming to this page for me! Can you folks try to clean your cookies, or open on an incognito tab? If you still gets this error, please let me know and I’ll talk to our website team!

Ohh?? I tried… must be issue at my end . anyway.. good if it’s working well.

Hi all,

My solution is available here: JKISeason 4-29 - Preparing Your Office Equipment Data – KNIME Community Hub

Thanks @berti093 for the LLM Prompt! :slight_smile:

Interesting as always to get to use many different nodes.

Cheers all

Jerome

3 Likes

Hi @berti093 ,

brilliant workflow as allways.

Do you know that there is a “Group Settings” tab in the numeric outliers node?
With this setting you can avoid the "“group loop” construct in your workflow

Best,
Andreas

2 Likes

This one turned into a bit of a Rube Goldberg machine but got to a solution in the end…

2 Likes

Thank you Andreas!

And thank you for highlighting that! I didn’t know about it :open_mouth: Really good to learn new (and more efficient) approaches! :slight_smile:

2 Likes

I’ve completed the entire workflow and wanted to share my experience.
For the final categorization step, I used the OpenAI API integrated directly into KNIME, which worked remarkably well after some careful prompt tuning.

I’d like to share a couple of screenshots and my KNIME Hub flow so others can review, test, or improve on it.

This exercise was incredibly insightful — especially in understanding how LLMs behave inside a structured data pipeline, and how important strict prompting and validation become when automating classification tasks.

Thanks again to everyone involved. Happy to iterate further or discuss improvements!

Best,

Alpay Zeybek

3 Likes

Here’s my solution. I made no attempt to integrate a LLM. Joined some of the categories and created a ranking based on Mean Review Score, Price and Rating. Their order can be changed as the user sees fit.

3 Likes

Greetings KNIMERs

Here is my attempted solution to this week’s challenge:

This challenge is taking the integration with LLMs to another level and it was great to up-skill my knowledge in this area. I don’t know when I’ll need this knowledge but i’m sure the opportunity will eventually present itself.

here is the output file with the integration of LLM for new/updated categories.

LLM Output-Full Dataset.xlsx (1.2 MB)

I tried to address the other queries in the challenge as well.

LLM Visualizations:

Under-represented categories

Price & Rating by Titles

Price by Category

Cheers

3 Likes

Hi @armingrudd

I realised I didn’t use the right tag…

Could you check that the workflow will be taken in account ?

Thanks

Cheers

1 Like

Hello @trj,

Of course it will! You just need to wait for the next DB update run and then leaderboard update. The leaderboard should get updated by tomorrow noon latest. If not, please let me know.

2 Likes

Hello everyone,

Interesting Knime Challenge! I have discovered the Numeric Outliers node and practiced LLM prompting. Thanks for the challenge. Hereis my solution.

3 Likes

Hi @armingrudd

Just a head up, the leaderboard seems still out. Latest refresh is yesterday / just mindful if the solution will be taken in account.?

Thanks

Oh, sorry for the confusion, I meant CET time zone. which gives it about 2 hours to update!
I just checked and can confirm that it has been updated in DB and the leaderboard updates at noon CET everyday.

2 Likes