Football match predictor

bruss · September 9, 2022, 11:19am

Hi everybody, I’m a newbie in data science, so this is my idea. I think that all the matches of football can predicts thanks the statistics before the match and I’ve develop a multinomial logistic regression. I try to explain better, for example if there is a match A vs B on 12/12/2022, i can predict the result with the entires statistics about A and B before that match. The main statistics comes from understat.com and are for examples, mean goals, mean goals againist, mean expected goals, mean expected goals againist and other that I’ve explained in the black box on the workflow.
toto2021.knwf (424.3 KB)
The problem is that the accuracy is around 50% and these are my question: How can improve the accuracy of the model?
I’ve found these suggestions but I don’t know if are they possibile in knime, and if yes, how:

Feature Scaling and/or Normalization
Class Imbalance
Optimize other scores
Hyperparameter Tuning - Grid Search
Explore more classifiers
Error Analysis

Can you help me please?

thanks

HansS · September 11, 2022, 6:15pm

Hi @bruss and welcome to KNIME Forum

Cool to create a model that can predict the outcome of a football match.
Your wf is not so easy to understand. But I can give you some suggestions to improve the accuracy of your model.

I think you need more data (matches) and less features
Your features will make the difference wether your are able to create a model that meets your expectations.
Do some analysis on the feature importance of your current feature set, and see what kind of features add the most to the prediction and get inspired for more/other features.
Some ideas for new features

ranking position home team vs away team
goal difference home team vs away team
historical mutual result
probability of winning, losing or draw given the result of the previous match(es) for the home and away team
average number of goals scored and against in the last x matches

But always be sure that no information about already played matches finds its way into the features.
gr. Hans

bruss · September 21, 2022, 2:49pm

Thank you Hans for yout suggestions. I have already start from a model with more data (all matches of 2021, look here
toto2021.knwf (435.2 KB)
) and I start my logistic learner with only 1 or 2 features and then I add mores but the accuracy is always around 50% (low enough). I also try with a naive bayes learner but I don’t understand how it works, do you know?
Can you help me for example to find the outliers on the logistic regression model?

thks
Bruss

system · December 20, 2022, 2:50pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.