Solutions to “Just KNIME It!” Challenge 7 - Season 4

forum · June 25, 2025, 1:48pm

We just published a new Just KNIME It! challenge!

Let’s steer our focus this week to sports analytics with a puzzle on soccer. With the help of AI, find undervalued players, rising stars, and other patterns in soccer data, generating an insightful report for scouts as output.

Here is the challenge. Let’s use this thread to post our solutions to it, which should be uploaded to your public KNIME Hub spaces with tag JKISeason4-7 .

Need help with tags? To add tag JKISeason4-7 to your workflow, go to the description panel in KNIME Analytics Platform, click the pencil to edit it, and you will see the option for adding tags right there. Let us know if you have any problems!

rfeigel · June 25, 2025, 2:25pm

For those of us who aren’t soccer insiders, the data is practically meaningless without a full description of each data point.

alinebessa · June 25, 2025, 2:29pm

Hi @rfeigel! I’ll talk to the author and ask for some metadata now. Thanks for this feedback!

alinebessa · June 25, 2025, 3:06pm

We uploaded the challenge’s dataset folder to also contain a metadata file: Just KNIME It/Solutions – Challenge 7 - Dataset – KNIME Community Hub

Thanks again for the feedback, @rfeigel!

hanantoprabowo · June 25, 2025, 3:51pm

Hi @alinebessa ,

Sadly I’m not able to find the file: players_data_light-2024_2025.csv in the provided link. Will it still be uploaded?

Thank you !

alinebessa · June 25, 2025, 4:50pm

I can see it here: Just KNIME It/Solutions – Challenge 7 - Dataset – KNIME Community Hub Please let me know what kind of message you’re getting when clicking this link! Thanks!

hanantoprabowo · June 25, 2025, 6:17pm

Hi @alinebessa

Thank you for the speedy reply

In the metadata.txt, it is mentioned that there is a streamlined version players_data_light-2024_2025.csv

I wonder whether the file is also available and can be used as input data. Otherwise, I’ll use the existing file players_data-2024_2025.csv and filter out the non-relevant columns.

thank you very much

sanket_2012 · June 25, 2025, 10:26pm

Hello @hanantoprabowo ,
For this challenge, please continue using the players_data-2024_2025.csv.

Please let me know if you have any further questions.

Thanks,
Sanket

rfeigel · June 26, 2025, 1:34am

This is becoming extraordinarily frustrating. The metadata file is a text file with lots of extraneous junk. I cleaned it manually in Notepad++, but the hyphen/dash is (for me) indecipherable. I haven’t figured out a way to split the abbreviation and full description in the txt file. I’ve tried regex splitting on a hyphen and several types of dashes. Nothing works. Consequently, I haven’t been able to join the the csv file with abbreviations to the txt file with the full descriptions. I suppose I could proceed to the AI, but it would be nice to know what the data means without having to refer manually to the metadata.

sanket_2012 · June 26, 2025, 6:15am

Hello @rfeigel ,
Thanks for bringing up the concerns regarding the dataset. As the author of this challenge, I should have provided a more detailed description of the columns in the dataset; I apologize for this oversight.

Regarding the metadata.txt file, the challenge does not require its use; it is provided for understanding the meaning of the columns in the dataset.

The goal of this challenge is to create various prompts and observe the results obtained from the LLMs for those prompts, and then compile a report.

The challenge mentions that you want to recruit new, talented footballers, and you are in search of them. This means you can search for players in a specific country or league, and since you are looking for young footballers, the players’ ages are a consideration. These are the two things that you might want to focus on initially.

To simplify more, you can work with the columns that have a description in the metadata file and look for young players only with those attributes. Many columns in the dataset contain missing data, which can also be filtered.

I hope this explanation helps a bit.

Thanks,
Sanket

trj · June 26, 2025, 8:09am

Hi team,

Always interesting to use LLM prompter and create a chat app.
My solution here: JKISeason 4-7 - AI-Generated Football Scouting Report – KNIME Community Hub

I decided to go through 3 pages on my Data App allowing to:

Filter the main dataset on some criterias
Use the chat app to get insights
Do the reporting on the results of the chat

Some screenshots below.
Enjoy!

Cheers
Jerome

arief_rama · June 26, 2025, 8:23am

I’m not really a football fan… but this KNIME challenge had me learning more about football than I ever expected!

Just uploaded my solution for this week’s challenge on my hub— since I had zero clue who the players were, I decided to add their photos into the report. Gotta put a face to the stats, right?

Augmented Scouting Report with LLM Analysis.pdf (31.3 KB)

PVergati · June 26, 2025, 5:38pm

Amazing as always!!! Bravo:clap:

berti093 · June 26, 2025, 7:37pm

Huhh this was harder then I originally thought

My solution: JKISeason4-7_berti093 – KNIME Community Hub

So my steps were:

I filtered the data so ~50% the rows remained
There were same positions in different order, I switched them (so MF,FW = FW,MF)
I connected KNIME to my Local lama LLM
I used group looping so I will have a suggestion for all the different positions
I tried to structure the LLM’s unstructured answer

So this is the methodology in a nutshell. The real challenges were:

The local LLM is real slow (with the JSON converted 200+ players and 200+ columns) so the whole workflow runs 4 hours (on my laptop)
The LLM hallucinated for two positions (I highlighted it in the final table). Maybe it could be resolved with refining the prompt but with 4 hours runtime, it’s a little to much time testing (also I had to rerun the whole workflow one time, I pulled out my hair )

My final table:

And the real calculation heavy part was structuring the final answer:

I’m not in football so I’m really not sure how “good” are the answers, so I would really love feedback from football analytics experts on the positional recommendations!

hanantoprabowo · June 26, 2025, 11:57pm

Hi everyone,

This is my solution for this week’s challenge: JKISeason4-7 – KNIME Community Hub

I’m using the free version of HuggingFace (with limited inference usage ).

To stay within the limit, the following steps were taken:

Only some relevant columns, as specified in metadata.txt, were retained
Duplicate rows were removed based on player names

I’ve tested various Hugging Face inference models, including bigscience/bloom, HuggingFaceH4/zephyr-7b-beta, and mistralai/Mistral-7B-Instruct-v0.3. Among these, the last one appears to run reliably without any error messages, unlike the first two, which occasionally return issues such as ‘model overloaded’ or HTTP 404 errors.

Visualization and reporting workflow

Data app

Example PDF report:
report.pdf (18.2 KB)

Any feedback is greatly appreciated

Have a great day everyone!

jproudfoot111 · June 27, 2025, 2:57am

Thanks to all who posted before me for some great ideas on how to handle the challenge.

Wanted to see how the model would perform in picking the best young players for the different positions.
The tables I was sending originally to gpt were rejected - too many tokens, so used an intermediate pareto ranking for the different positions to narrow to the best candidates for the model to work on.

PVergati · June 28, 2025, 5:51am

Your solution has completely floored me—it’s as if I’m consulting the sporting director of a Champions League club!

PVergati · June 28, 2025, 6:58am

Scouting Report – Season 2024/25 Insights

I’m excited to share my two-phase, AI-powered approach to uncovering talent in our dataset:

My solution to JKI 4-7

Data Prep & Feature Engineering

Clean-up: Dropped columns with identical values or > 95 % missing data.
Minimum playtime filter: Excluded anyone under 270 min (fewer than 3 full matches).

Derived metrics:
- Discipline Index (“Paolo Montero” style for those who remember about him) to spotlight the most tenacious tacklers (and their occasional over-zealous fouls).
- Global Performance KPI + percentiles on both KPI and Minutes Played to normalize across positions and workloads.

Clustering – Who’s Who?
Using the combined KPI-vs-Minutes delta, I discovered 4 player archetypes:

“Still-in-the-Wings” – Young talents yet to soar (low minutes, low output)
“Hidden Gems” – Impact subs delivering big returns on limited minutes
“Engine-Room Grinders” – Workhorses logging heavy minutes with solid, steady contributions
“Headline Stars” – The marquee names: high minutes and top-tier performance

(Admittedly, with market-value data this picture would be even richer – imagine combining metrics with price tags!)

AI-Powered Reporting Workflow

Step 1: Global Analysis
Feed the entire dataset into our LLM to highlight standout profiles and rising stars.

Step 2: Player Deep-Dive
Once a name is chosen, switch to player-centric prompts for detailed strengths, weaknesses, and peer comparisons.

A big thank to @hanantoprabowo for having inspired me

Happy to hear your thoughts (or counter-suggestions from my fellow scouts)! Let’s keep pushing the boundaries of data + AI in football

hanantoprabowo · June 28, 2025, 7:04am

Behold… the power of Gen AI powered sport director

hanantoprabowo · June 28, 2025, 7:09am

Thank you for the kind words. Your solution looks waaaaaaaaaaay more sophisticated. Big kudos to @PVergati!