Monte Carlo forecasting in football betting

Following the success of last season, my thoughts obviously turned to whether this can be repeated. Or more accurately, what can be expected from the algorithm betting model in future seasons both in terms of return and the variability of those returns.

“Monte Carlo” is the name given to simulations which make use of computer generated random numbers to identify the range of possible outputs a model may generate in the ‘real world’.  Each random number generates an input from a user defined probability distribution which is run through the user’s model to produce a simulated outcome on each run.  Run this simulation thousands of times and you generate a probability distribution for the output of your model.

Assigning a probability distribution to football match results

The input variable for a football betting model is the match result.  It is variable because the result could have taken any one of three values; home, draw or away.

The major risk of using monte carlo is in the probability distribution you ascribe to this input; i.e. it may not accurately describe the “true” range of possibilities.  For example, if the true mean chance of Man Utd beating Arsenal is 65% at home in 08/09 (we don’t ever know this of course) with the mean chance of a loss being 20% then this must be reflected in our distribution we place in our monte carlo model.  For Chelsea v Derby the same figures may be 95% and 2%.  If, for convenience, we use the same distribution for all Premiership games then it is unlikely it will be relevant to Man Utd/Arsenal and Chelsea/Derby. 

Additionally, the formulation of the distribution must be independent of your model’s odds and the market’s odds.  If you use your model’s odds then you are generating the match results based on your predictions.  The more you limit the variance of the distribution the closer your generated results will be to your predictions.  With this approach you cannot fail to lose.  The same argument goes for the market’s odds.  The closer your generated match results to the market’s prediction, the closer your maximum available profit will be to zero; i.e. it will be harder to earn a profit.

I have therefore defined a distribution based on the actual match result.  There are many choices as to what shape the distibution should take.  For example, a draw could have had an equal chance of being a home win or away win using a normal distribution.  A home win could have been a draw but less chance of ending an away win using a normal distribution with a mean home result or a skew toward the home result.  Same logic applies to the away win.  I’ve applied the normal distribution over the actual result to represent the distribution of likely outcomes.  How accurate this is will always remain an unknown.

Visualising the future

Once I ran the simulation and looked at the results the next stage of course was to feed this back into the model and start again searching for the best results.  Keep testing and adapting the model based on the last set of results.  This, I have found, is an extremely time-intensive and ongoing job.  Just to run the simulation takes hours, let alone adapt and improve.  When I think about rolling this out to other leagues I am glad there is a 3 month off-season!

The risk of data mining does raise it’s head at this point; i.e. by searching for the best result (top of the hill), one cannot expect such good results the next season (the only way to go is down the hill).  But, I do believe this is dealt with by running a large simulation with at least 1000 runs.  Surely, after all this is the advantage and theory behind monte carlo.  In the hill analogy, we do not expect to be at the top of the hill next season but at least we know how high and wide the hill is; i.e. the likely distribution of the outcome.

Theory

The whole theory behind value betting is that in the long term profits can be earned by taking advantage of the market odds being mispriced.  If this is correct and I have made realistic assumptions then any ‘run’ of a season should result in a profit (since a season includes enough games to allow for ‘extreme’ results to be averaged out).

Assumptions

Due to the time involved I concentrated on 2 seasons of results, 06/07 and 07/08.  For both of these I have Betfair historical prices.  I also assume a 5% commission payable on profits.  My probability distribution of results changes the result of roughly 35% of games on each run.  An average of 3-4 out of the 10 games played in any week ending in a different result I think this is a sensible assumption. 

Results

The following results pertain to the latest Balanced Fund model which will continue next season. 

Below are descriptive statistics for the 2 seasons’ distributions after performing over 1500 runs.  I also add statistics where I have them for the S&P 500 to show the preferential risk/return profile of the betting model. 

06/07

07/08

S&P 500*

Probability distribution

 LogLogistic

Weibull

 Normal

Mean

 47%

 224%

12%

Standard deviation 

 38%

 82%

19%

5% lower limit 

 5%

 9%

-26%

95% upper limit 

 113%

 362%

50%

Median 

 43%

 223%

n/a

Skewness

 1.27

 0.11

n/a

Kurtosis

 8.72

 2.72

n/a

Sharpe ratio (risk free rate 3.5%)

1.1

2.7

0.4

*S&P 500 figures are based on 100 years 

 

 

 

Firstly, the mean return of each season is positive.  The 5% limit which represents the return below which you would expect the worst 5% of outcomes are 5% and 9%; e.g. for 07/08 season the chance of the return falling below 9% was 5%.  Suffice to say, we would expect nearly all outcomes to give positive returns in both seasons.

Compare this with the S&P 500 which has a mean total return of 12% and standard deviation of 19%.  It’s common to assume S&P returns as normally distributed so the upper and lower 5% limits are -26% and 50%. 

A measure of ‘value for money’ in investing is the Sharpe ratio.  This figure meaures the excess return of an investment over the risk-free return (often US T-Bills) as a ratio of the standard deviation (a risk measure) of the investment.  For the S&P 500 the Sharpe ratio is 0.4.  Both seasons 06/07 and 07/08 beat this figure comfortably with 1.1 and 2.7 respectively.  The betting model for 07/08 season earns 8 times the profit of the S&P 500 (above the risk-free rate) for an equivalent level of risk.

History is no predictor of the future, etc…

We are of course only looking at 2 seasons.  Whilst we have run monte carlo simulations which show the mean and expected returns for those seasons it cannot escape notice the 2 distributions vary.  I did hope they would look the same.  Though it is comforting to note they do overlap.  Why 06/07 is so much more compact than 07/08 can only be guessed at.  Perhaps one explanation however is to consider why the 07/08 distribution is so wide.  This could be explained by just a few badly mispriced games or big upsets.  If the model won those bets this would have a large positive effect on the return.

I find it difficult to believe that the play of the Premier league teams or betting patterns for each season varies significantly which would help explain varying results.  The model does not appear to have a consistent edge over the market’s odds and I can therefore only assume that the expected return from any one season will vary using this model.  Where our 2 monte carlo distributions fall on the distribution for all seasons is another unknown. 

Conclusion

Using seasons 06/07 and 07/08 as a test case indicates there are significant profits to be earned from betting on the Premier League using the Balanced Fund algorithm model.  The distribution of likely returns in any one season remains an unknown and therefore there is a risk of loss. 

To determine how typical these 2 seasons are of future seasons would require more seasons to be tested. 



You must be logged in to post a comment.