Solutions to “Just KNIME It!” Challenge 7 - Season 4

:sun_with_face: We just published a new Just KNIME It! challenge! :sun_with_face:

:soccer: Let’s steer our focus this week to sports analytics with a puzzle on soccer. With the help of AI, find undervalued players, rising stars, and other patterns in soccer data, generating an insightful report for scouts as output. :memo:

Here is the challenge. Let’s use this thread to post our solutions to it, which should be uploaded to your public KNIME Hub spaces with tag JKISeason4-7 .

:sos: Need help with tags? To add tag JKISeason4-7 to your workflow, go to the description panel in KNIME Analytics Platform, click the pencil to edit it, and you will see the option for adding tags right there. :blush: Let us know if you have any problems!

1 Like

For those of us who aren’t soccer insiders, the data is practically meaningless without a full description of each data point.

Hi @rfeigel! I’ll talk to the author and ask for some metadata now. Thanks for this feedback!

1 Like

We uploaded the challenge’s dataset folder to also contain a metadata file: Just KNIME It/Solutions – Challenge 7 - Dataset – KNIME Community Hub

Thanks again for the feedback, @rfeigel!

1 Like

Hi @alinebessa ,

Sadly I’m not able to find the file: players_data_light-2024_2025.csv in the provided link. Will it still be uploaded?

Thank you :slight_smile: !

I can see it here: Just KNIME It/Solutions – Challenge 7 - Dataset – KNIME Community Hub Please let me know what kind of message you’re getting when clicking this link! Thanks!

Hi @alinebessa

Thank you for the speedy reply :wink:

In the metadata.txt, it is mentioned that there is a streamlined version players_data_light-2024_2025.csv

I wonder whether the file is also available and can be used as input data. Otherwise, I’ll use the existing file players_data-2024_2025.csv and filter out the non-relevant columns.

thank you very much

Hello @hanantoprabowo ,
For this challenge, please continue using the players_data-2024_2025.csv.

Please let me know if you have any further questions.

Thanks,
Sanket

2 Likes

This is becoming extraordinarily frustrating. The metadata file is a text file with lots of extraneous junk. I cleaned it manually in Notepad++, but the hyphen/dash is (for me) indecipherable. I haven’t figured out a way to split the abbreviation and full description in the txt file. I’ve tried regex splitting on a hyphen and several types of dashes. Nothing works. Consequently, I haven’t been able to join the the csv file with abbreviations to the txt file with the full descriptions. I suppose I could proceed to the AI, but it would be nice to know what the data means without having to refer manually to the metadata.


1 Like

Hello @rfeigel ,
Thanks for bringing up the concerns regarding the dataset. As the author of this challenge, I should have provided a more detailed description of the columns in the dataset; I apologize for this oversight.

Regarding the metadata.txt file, the challenge does not require its use; it is provided for understanding the meaning of the columns in the dataset.

The goal of this challenge is to create various prompts and observe the results obtained from the LLMs for those prompts, and then compile a report.

The challenge mentions that you want to recruit new, talented footballers, and you are in search of them. This means you can search for players in a specific country or league, and since you are looking for young footballers, the players’ ages are a consideration. These are the two things that you might want to focus on initially.

To simplify more, you can work with the columns that have a description in the metadata file and look for young players only with those attributes. Many columns in the dataset contain missing data, which can also be filtered.

I hope this explanation helps a bit.

Thanks,
Sanket

1 Like

Hi team,

Always interesting to use LLM prompter and create a chat app.
My solution here: JKISeason 4-7 - AI-Generated Football Scouting Report – KNIME Community Hub

I decided to go through 3 pages on my Data App allowing to:

  1. Filter the main dataset on some criterias
  2. Use the chat app to get insights
  3. Do the reporting on the results of the chat

Some screenshots below.
Enjoy!

Cheers
Jerome


4 Likes

I’m not really a football fan… but this KNIME challenge had me learning more about football than I ever expected! :sweat_smile::soccer:

Just uploaded my solution for this week’s challenge on my hub— since I had zero clue who the players were, I decided to add their photos into the report. Gotta put a face to the stats, right? :smile::bar_chart::camera_flash:


Augmented Scouting Report with LLM Analysis.pdf (31.3 KB)

5 Likes

Amazing as always!!! Bravo​:clap::clap::clap:

1 Like

Huhh this was harder then I originally thought :smiley:

My solution: JKISeason4-7_berti093 – KNIME Community Hub

So my steps were:

  1. I filtered the data so ~50% the rows remained
  2. There were same positions in different order, I switched them (so MF,FW = FW,MF)
  3. I connected KNIME to my Local lama LLM
  4. I used group looping so I will have a suggestion for all the different positions
  5. I tried to structure the LLM’s unstructured answer

So this is the methodology in a nutshell. The real challenges were:

  • The local LLM is real slow (with the JSON converted 200+ players and 200+ columns) so the whole workflow runs 4 hours (on my laptop)
  • The LLM hallucinated for two positions (I highlighted it in the final table). Maybe it could be resolved with refining the prompt but with 4 hours runtime, it’s a little to much time testing :smiley: (also I had to rerun the whole workflow one time, I pulled out my hair :smiley: )

My final table:

And the real calculation heavy part was structuring the final answer:

I’m not in football so I’m really not sure how “good” are the answers, so I would really love feedback from football analytics experts on the positional recommendations! :smiley:

2 Likes

Hi everyone,

This is my solution for this week’s challenge: JKISeason4-7 – KNIME Community Hub

I’m using the free version of HuggingFace (with limited inference usage :blush:).

To stay within the limit, the following steps were taken:

  • Only some relevant columns, as specified in metadata.txt, were retained
  • Duplicate rows were removed based on player names

I’ve tested various Hugging Face inference models, including bigscience/bloom, HuggingFaceH4/zephyr-7b-beta, and mistralai/Mistral-7B-Instruct-v0.3. Among these, the last one appears to run reliably without any error messages, unlike the first two, which occasionally return issues such as ‘model overloaded’ or HTTP 404 errors.

Visualization and reporting workflow

Data app



Example PDF report:
report.pdf (18.2 KB)

Any feedback is greatly appreciated :star_struck:

Have a great day everyone!

2 Likes

Thanks to all who posted before me for some great ideas on how to handle the challenge.

Wanted to see how the model would perform in picking the best young players for the different positions.
The tables I was sending originally to gpt were rejected - too many tokens, so used an intermediate pareto ranking for the different positions to narrow to the best candidates for the model to work on.

2 Likes