How it works



As of the 2019-2020 season, the information written below about the Elo model is out-of-date. For full details of the current implementation that used a genetic algorithm to optimise the rating system parameters for prediction accuracy, please see this post on my website.

For posterity I've kept the old methodology below.

In short, an Elo-like system is used which incorporates both home-advantage and margin of victory (MoV). A large amount of inspiration is taken from FiveThirtyEight's NBA and NFL, and I can't thank them enough for making details of their system publicly available.

Elo-like systems are defined by two equations, governing the expected outcome of a game based on two team's strengths, and the method in which a team's rating is updated following a game.

Expected outcome

The expected outcome of a match E, is given by the following equation:

E=11+10dr400E = \frac{1}{1 + 10^{\frac{-dr}{400}}}

This encodes the expected result (a categorical outcome with 3 possible values) as a continuous score in the range [0, 1]. This is because Elo systems were initially described for games with only 2 outcomes. In football modelling, we assume a value of 0.5 indicates a draw. dr is the difference between the two team's pre-match ratings, while taking home advantage into account.

dr=elohomeeloaway+HAdr = \text{elo}_{home} - \text{elo}_{away} + HA

The home advantage value (HA) is calculated separately for each league and is updated each season.

Rating update

After a game, a team's rating is updated according to the following equation:

elonew=elopregame+KG(OE)\text{elo}_{new} = \text{elo}_{pregame} + KG(O-E)

The observed outcome O is defined as 1 for away win, draw, or home win respectively.

K acts as the system gain knob, reflecting the impact of a single match on a team's rating. A low value places greater emphasis on long-term form, resulting in a steady rating value, but it will be slow to reflect recent form. A large value of K places greater importance on recent results with long term form being less important. However, it is susceptible to noisy (i.e. unexpected) results. Predictaball uses K = 20.

G is an additional multiplier that allows for other factors to influence the change in rating, such as the goal differential. For draws or when MOV = 1, G = 1, otherwise:

G=log2(1.7MOV)22+0.001(elowinelolose)G = log_{2}(1.7MOV)\frac{2}{2+0.001(\text{elo}_{win} - \text{elo}_{lose})}

Adding MOV in to the equation is relatively self-explanatory, but the second set of terms is less obvious. It is there to handle the auto-correlation problem identified by FiveThirtyEight , whereby stronger teams tend to have their ratings inflated because they are more likely to win by large amounts. This term acts as a penalty function, reducing the multiplier when there is a large skill gap.


Predicting goals scored

As of the 2019-2020 season, the information written below about the hierarchical Bayesian model is out-of-date. For full details of the current implementation that uses an ordinal neural network optimised by a genetic algorithm, please see this post on my website.

For posterity I've kept the old methodology below.

New for the 2018-2019 season I've taken the same hierarchical structure as the ordinal multinomial regression summarised below and adapted it to use a Poisson outcome to predict the number of goals scored by each team. In the following notation h and a superscript denote home and away teams respectively. I.e. for a given game i

goalsihPoisson(μih)\text{goals}^{\text{h}}_{i} \sim \text{Poisson}(\mu^{\text{h}}_{i})
goalsiaPoisson(μia)\text{goals}^{\text{a}}_{i} \sim \text{Poisson}(\mu^{\text{a}}_{i})
log(μih)=αleagueih+βhΔeloilog(\mu^{\text{h}}_{i}) = \alpha^{h}_{\text{league}_{i}} + \beta_{h} \Delta \text{elo}_{i}
log(μia)=αleagueih+βaΔeloilog(\mu^{\text{a}}_{i}) = \alpha^{h}_{\text{league}_{i}} + \beta_{a} \Delta \text{elo}_{i}

Which corresponds to a hierarchical intercept that varies by league and a single match-level predictor in the form of the team rating difference

Δeloi=eloiheloia\Delta \text{elo}_{i} = \text{elo}^{\text{h}}_{i} - \text{elo}^{\text{a}}_{i}

By drawing a large number of samples from the posterior of both home and away goals scored, you can simply obtain estimates for the probability of each match outcome. The main motivation for modelling goals scored rather than the outcome directly is that it enables me to simulate a full season at a time, as the rating system requires goal difference.

Forecasting outcome

The following model was used for the 2017-2018 season, but I've left the details up for posterity. It was a hierarchical Bayesian ordinal multinomial regression, only taking the league and elo difference between the two teams as inputs. In particular, it was modelled using JAGS as the following:

OutcomeiMultinomial(1,ϕi)\text{Outcome}_{i} \sim \text{Multinomial}(1, \phi_{i})

Which represent the three probability parameters for each of the outcomes

ϕi,away=Ti,1\phi_{i, \text{away}} = T_{i, 1}
ϕi,draw=Ti,2Ti,1\phi_{i, \text{draw}} = T_{i, 2} - T_{i, 1}
ϕi,home=1Ti,2\phi_{i, \text{home}} = 1 - T_{i, 2}

An ordinal model works by identifying two appropriate thresholds of the linear predictor to obtain the 3 probabilities, ensuring that the order of the probabilities do matter (unlike a standard multinomial model).

logit(Ti,1)=αleaguei,1μilogit(T_{i, 1}) = \alpha_{league_{i}, 1} - \mu_{i}
logit(Ti,2)=αleaguei,2μilogit(T_{i, 2}) = \alpha_{league_{i}, 2} - \mu_{i}

The linear predictor is based simply on a single match-level covariate: the rating difference

μi=βΔeloi\mu_{i} = \beta \Delta \text{elo}_{i}

This is a hierarchical model with varying intercepts, which depend on the league the match was held in, but fixed slopes. I found that using varying slopes (i.e. beta also depends on the league) didn't improve the model accuracy but had a significant effect on the time taken to fit the model, and I generally prefer parsimonious models anyway.

Previously, I spent more time optimising the model, using more complex techniques such as Evolutionary Algorithms and hierarchical Bayesian models to form team strength measures. However, I now prefer to place a greater emphasis on accurately modelling a team's skill level, and using a relatively simple model to obtain match outcome probabilities.


Predictive accuracy across all leagues

Accuracy: 54%
RPS: 0.194

Biggest Upset

Barcelona 0 - 1 Leganes (2024-12-15)
Pre-match probabilities (H | D | A)
86% | 10% | 5%