Skip to contents

Setup

The data contained in the package epl_20_21 corresponds to the scores of the games of the English Premier League (EPL) during the season 2020-2021. The data can be accessed as:

data("epl_20_21")
head(epl_20_21)
#> # A tibble: 6 × 4
#>      HG    AG HomeTeam       AwayTeam   
#>   <dbl> <dbl> <chr>          <chr>      
#> 1     0     3 Fulham         Arsenal    
#> 2     1     0 Crystal Palace Southampton
#> 3     4     3 Liverpool      Leeds      
#> 4     0     2 West Ham       Newcastle  
#> 5     0     3 West Brom      Leicester  
#> 6     0     1 Tottenham      Everton

Fitting

We can fit a bivariate regression model for the goals scored by the home and away team, as stated in the paper (in revision).

We first adjust the data to use the mcmc_cmp function

y <- cbind(epl_20_21$HG, epl_20_21$AG)

X1 <- model.matrix(HG ~ 1 + HomeTeam + AwayTeam, contrasts = list(HomeTeam = "contr.sum", AwayTeam = "contr.sum"), data = epl_20_21)

X2 <- model.matrix(AG ~ 1 + AwayTeam + HomeTeam, contrasts = list(AwayTeam = "contr.sum", HomeTeam = "contr.sum"), data = epl_20_21)

X <- list(X1, X2)

Then, we fit the model as

fit <- mcmc_cmp(y = y, X = X, S = 100000, nburn = 10000, scale_cov_b = 0.6, 
                scale_cov_beta = 0.01, scale_cov_gamma = 0.02, progress = "bar",
                v0 = 50)

Convergence Diagnostic

After fitting the model, we can check the convergence visually of the parameters by plotting their trace plots.

list_colors <- c("#D1D1D1", "#BDBDBD", "#333333", "#424242", "#575757",
                 "#6B6B6B", "#000000", "#A8A8A8", "#949494", "#808080")
list_colors <- rep(list_colors, 10)[1:ncol(fit$posterior_beta[[1]])]

plot(NULL, ylim = c(min(fit$posterior_beta[[1]]), max(fit$posterior_beta[[1]])), xlim = c(0, nrow(fit$posterior_beta[[1]])),
     ylab = expression(beta[ij]), xlab = "Iteration",
     main = "Traceplot of the parameters associated to the Home Goals")

for(beta in 1:ncol(fit$posterior_beta[[1]])){
  lines(fit$posterior_beta[[1]][,beta], col = list_colors[beta])
}

We can check the fitting of the model to the real data using the function fitting_plot to plot the Rootograms for each response.

Parameters Estimation

We can also plot the estimated parameters to facilitate their interpretation.

beta1_est <- fit$estimation_beta[[1]]
beta2_est <- fit$estimation_beta[[2]]

off_home <- beta1_est[2:20]
off_home <- c(off_home, 0 - sum(off_home)) #Sum-to-zero constraint
def_away <- beta1_est[21:39]
def_away <- c(def_away, 0 - sum(def_away)) #Sum-to-zero constraint

def_home <- beta2_est[21:39]
def_home <- c(def_home, 0 - sum(def_home)) #Sum-to-zero constraint
off_away <- beta2_est[2:20]
off_away <- c(off_away, 0 - sum(off_away)) #Sum-to-zero constraint

We can find the plots for the strengths of teams playing at home and away below. These plots allow us to identify the teams that had a good season and those with the worst performance. The point (1,1) in the plots represents the mass center. Teams above the horizontal and vertical lines (y = 1, x = 1) performed better than teams located below them.

ggplot(table_teams, aes(x = Attack.Home, y = Defense.Home)) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "grey50") +
  geom_vline(xintercept = 1, linetype = "dashed", color = "grey50") +
  geom_point() +
  geom_label_repel(aes(label = Home), box.padding = 0.25, point.padding = 0.5, 
                   segment.color = "grey50") +
  labs(title = "Strengths playing Home",
       x = TeX("Attack $(\\exp(\\beta^{\\omega_{H_i}}))$"), 
       y = TeX("Defense $(\\exp(-\\beta^{\\delta_{H_i}}))$")) + 
  personal_theme


ggplot(table_teams, aes(x = Attack.Away, y = Defense.Away)) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "grey50") +
  geom_vline(xintercept = 1, linetype = "dashed", color = "grey50") +
  geom_point() +
  geom_label_repel(aes(label = Home), box.padding = 0.25, point.padding = 0.5,
                   segment.color = "grey50") +
  labs(title = "Strengths playing Away",
       x = TeX("Attack $(\\exp(\\beta^{\\omega_{A_i}}))$"), 
       y = TeX("Defense $(\\exp(-\\beta^{\\delta_{A_i}}))$")) + 
  personal_theme

We observed that Manchester City, the champion in the 2020-2021 season, was the best team both playing at home and away, followed by Manchester United and Liverpool. Manchester United, however, had a poor defense compared to the average team playing at home, conceding 28 goals at their stadium, Old Trafford. Liverpool was characterized by their strong attack when playing away but had an acceptable performance playing at home. Sheffield United, Fulham, and West Bromwich were relegated that season.

Data available in https://www.football-data.co.uk/