Hello Knime family! I’m a fairly new user who is a cartoon animator on a deep dive about basketball (NCAAM) analytics, predictions, forecasts, formulas, modeling, and so on. I’ve been using Knime for 2-3 weeks as a no-code, low-code user for a few hours each day.
I’ve found some much older posts about sports predictions like an 8 year old post about basketball, and another against bookmaker odds like Hans Samson’s awesome work covering football/soccer matches, which has been my main source for inspiration. I’ve spent 3-4 weeks prior to this finding basketball metrics data, schedules and the like, and researching Bill James’ log5 formula, and ideas from other famous basketball statisticians like Nate Silver, Bart Torvik, John Hollinger, Jeff Sagarin and Ken Pomeroy for winrate expectation and probabilities. I’ve exhausted what little I can find on youtube with directly relevant knime tutorials (Mostly by Dhaval Maheta), and I’m just looking for tips/advice on this workflow and how to improve from more data/finance/statistics minded people. I will try to show my current workflow below with the excel spreadsheet of some metrics I have. I’ve been able to get some simple linear regressions down for my metrics against wins and/or winrate. but have been unable to get naive bayes and other ML models to work for me. Humbly looking for some tutelage.
College team data 2025 (1).xlsx (85.3 KB)
Basketball regression and prediction.knwf (82.5 KB)
and welcome to the KNIME Forum. First of all, thank you for your kind words about my blog post.
How good to see that you have started working with KNIME and a real dataset. Curiosity is always a good driver to develop yourself and gain insights. Now that you have a nice dataset, the question is what do you want to do with it? Do you want to explain which variables make an important contribution to the number of matches won in a season? Or do you want to predict whether a team will win or lose? Or how many matches a team will win in a season? Or … .
The goal you want to achieve is not clear to me, and that makes it difficult for me and others to help you further. So try to formulate for yourself what you want to do next. And actually it does not matter, every path you choose leads to new discoveries.
gr. Hans
In additon to @HansS
Thanks so much for sharing this dataset—it’s been a great resource for learning and experimentation.
I’ve been using KNIME to explore new ideas as well. Welcome.
However, I realized that several of the features, such as Adj OE, Adj DE, PPP Off., PPP Def., and even Elite SOS, are highly influenced by full-season results. These metrics are often derived from total points scored or allowed throughout the season, or adjusted based on opponent strength—which itself depends on end-of-season data. Because of this, including them introduces data leakage: the model is essentially learning the outcome from variables that already reflect that outcome.
So, while the model appears accurate, it isn’t truly predictive—it’s descriptive. To build a forward-looking prediction model, we’d need game-by-game stats available before each match, focusing on variables like shooting efficiency, turnovers, rebounds, and recent form—independent of final outcomes.
This dataset has been a fantastic foundation. My next step would be to rework the model using only pre-outcome metrics for cleaner forecasting.
Thanks again for the post.
Hello @mwiegand
Hope all is going well, with your efforts. I have a few tips you might find useful.
- K.I.S.S
- read about Jules Regnault - just an overview
- have a basic understanding of Regression to Mean
- Understand basic Prospect theory - and how that may be applied to betting patterns and, as a direct consequence, odds
- predicting probabilities of outcomes in the same way everyone else does will mean you get the same results as everyone else
- Enjoy the ride (process of what you are doing).
- Long term 5% ROI is very good! - you wont find 25%, 35% long term ROI (if you do PM me
)
Even if you cant find ‘an edge’. you will have picked up and learnt so many transferable skills!
Good luck
Alex
Interesting post! But is it really releated to this thread here?
Hi
yes, if you are comparing your assessment of probability to bookmaker odds (which are an assessment of probability minus the bookmakers margin, overall book risk etc) you need to know / understand many other things.
For an example (a free tip - use this tip at your own risk! this is a generalisation that may need refining AND this wont work with a bookmaker - you need to use an exchange):
Take a top premier league match between 2 very popular big teams. I can say that lots of betters will either put money on team A AND B. The result of this behaviour is that the odds of Team A and B go down and the odds of the draw go up. Making the draw a ‘value’ bet and Team A and B ‘not value’ compared with probability. If you dont understand this dynamic you dont understand part of what you are looking at.
and no I am not suggesting any hints or tips on work flows.
I maybe wrong, but I recognise where @Matthew_Ellwood is and where he wants to go. I was just sharing some advice on how he might get there. But, (English saying:) there are many ways to skin a cat.
Frank
@FrankColumbo you might have mentioned / linked the wrong guy Did you meant Matthew_Ellwood by any chance?
lol I did
sorry for the confusion!
No worries [post must be at least 20 characters long] xD