The Bundesliga is Germany’s primary football league. It is one of the most important football leagues in the world, broadcast on television in over 200 countries.
If you want to get your hands on a tool to forecast the result of any game (and perform some more statistical analyses), read on!
The basis of our forecasting tool was laid in this blog post: Euro 2020: Will Switzerland kick out Spain too?. There we also explained the methodology. For this post, we adapted the parameters for the Bundesliga (the sources are given in the code below) to forecast the result of the upcoming game Hertha BSC (Berlin) against the international top team Bayern Munich on August 28 as an example. The tool can also easily be adapted to other football leagues, e.g. the English Premier League.
On top of that, we made the model even more accurate by adding a home advantage. This effect is surprisingly stable across the main European football leagues at about 0.4 goals extra for the home team. By the way: in times of Corona, when no spectators were allowed in the stadiums, the home advantage disappeared!
Another thing we added is a probability calculation for all possible outcomes. We do this by assuming that the goals scored for each team are independent of each other (it can be discussed whether this is a reasonable assumption) so that all marginal probabilities can just be multiplied. This can easily be done in R with the outer()
(product) function (= %o%
). The most probable outcome can then easily be extracted:
mean_total_score <- 3.03 # https://de.statista.com/statistik/daten/studie/1622/umfrage/bundesliga-entwicklung-der-durchschnittlich-erzielten-tore-pro-spiel/ # https://www.transfermarkt.de/bundesliga/marktwerteverein/wettbewerb/L1 team1 = "Bayern Munich"; colour1 <- "red" ; value1 <- 818.5 # rows team2 = "Herta BSC" ; colour2 <- "blue"; value2 <- 176.75 # columns # https://www.saechsische.de/mehr-auswaerts-tore-bei-geisterspielen-5219318.html ratio <- value1 / (value1 + value2) # 0.4 goals = home advantage mean_goals1 <- ratio * mean_total_score + 0.4 # has to be + 0.2, see update below! mean_goals2 <- (1 - ratio) * mean_total_score - 0.4 # has to be - 0.2, see update below! goals <- 0:7 prob_goals1 <- dpois(goals, mean_goals1) prob_goals2 <- dpois(goals, mean_goals2) probs <- round((prob_goals1 %o% prob_goals2) * 100, 1) # outer product colnames(probs) <- rownames(probs) <- goals parbkp <- par(mfrow=c(1, 2)) max_ylim <- max(prob_goals1, prob_goals2) plot(goals, prob_goals1, type = "h", ylim = c(0, max_ylim), xlab = team1, ylab = "Probability", col = colour1, lwd = 10) plot(goals, prob_goals2, type = "h", ylim = c(0, max_ylim), xlab = team2, ylab = "", col = colour2, lwd = 10) title(paste(team1, paste(goals[which(probs == max(probs), arr.ind = TRUE)], collapse = ":"), team2), line = -2, outer = TRUE) par(parbkp)
So, the most probable outcome will be Bayern Munich 2:0 Hertha BSC. Let us have a look at the probabilities in more detail:
probs ## 0 1 2 3 4 5 6 7 ## 0 4.8 0.7 0.0 0 0 0 0 0 ## 1 14.0 1.9 0.1 0 0 0 0 0 ## 2 20.2 2.8 0.2 0 0 0 0 0 ## 3 19.5 2.7 0.2 0 0 0 0 0 ## 4 14.1 1.9 0.1 0 0 0 0 0 ## 5 8.1 1.1 0.1 0 0 0 0 0 ## 6 3.9 0.5 0.0 0 0 0 0 0 ## 7 1.6 0.2 0.0 0 0 0 0 0
The number of goals of Bayern Munich is in the rows, Hertha BSC is in the columns. The 2:0 result has a probability of over twenty percent, which is quite high. But even a result of 3:0 still has a probability of nearly 20 percent!
To calculate the overall probabilities for a win for each team and a draw we can conveniently use the lower.tri()
, upper.tri()
, and diag()
functions:
sum(probs[lower.tri(probs)]) # probability team 1 wins ## [1] 91 sum(diag(probs)) # probability for a draw ## [1] 6.9 sum(probs[upper.tri(probs)]) # probability team 2 wins ## [1] 0.8
So, to answer the original question, Hertha BSC’s chance to beat Bayern Munich is below one percent: they need nothing less than a miracle to win in Munich!
DISCLAIMER
This post is written on an “as is” basis for educational purposes only and comes without any warranty. The findings and interpretations are exclusively those of the author and are not endorsed by or affiliated with any third party.
In particular, this post provides no sports betting advice! No responsibility is taken whatsoever if you lose money.
(If you make any money though I would be happy if you would buy me a coffee… that is not too much to ask, is it? 😉 )
UPDATE August 28, 2021
Bayern Munich won 5:0! As shown in the post it was to be expected that they win but this result is a real demonstration of power: the probability for this result was only 8.1%… but this is football!
UPDATE August 9, 2022
I made a mistake by adding the full home advantage to the home team and subtracting it from the away team. It must be halved because otherwise, the home advantage would be double what it really is! Sorry for the mistake (and as I said: no guarantee that I don’t make any mistakes in those predictions!)
UPDATE August, 2022
I created a video-series for this post (in German):
2 thoughts on “New Bundesliga Forecasting Tool: Can Underdog Hertha Berlin beat Bayern Munich?”