How to do that animated ‘race’ bar chart

In this article, we are going to analyze the all time performance of English football teams since their leagues began in 1888, and use that data to create an animated bar chart that cycles through each year and shows the total points accrued of the top ten teams throughout history. We will also use the title bar to provide a narrative about how the sport changed in England over 130 years. Here’s the finished product, and all code is on Github:

Lets be clear on our aims before we start. Here’s what we want to do:

  1. Generate an animation that cycles through each football season from 1888 to 2017 and show the cumulative all time league points for each team as of that year, displaying the top 10 teams. We will only consider points obtained in the top tier (top division) of English football (now known as the Premier League).
  2. Sprinkle occasional factoids into the title of the chart that helps tell a story about changes in football over time.

I’m not going to focus on styling and prettiness in this article, just the basic functionality that creates an animated graphic.

Preparing the data

Ideally what we need is a data set which tell us about league points obtained through history, but I was unable to find that set. Instead I found something much more detailed and awesome, which is a record of every single game played since 1888 and the result and score. I found it here in the form of an R dataframe object — so we will be doing this in R folks. Here is a quick snapshot of what the data looks like.

This dataset is fairly large, with almost 200,000 matches played in every tier since 1888. First we need to convert each match to a points allocation to a given team. Points are shared between the home team and the visitor (away) team as follows:

  • 2 points to the winner before the 1981 season. From the 1981 season, 3 points were awarded to the winner in a change of rules to encourage more attacking play.
  • 0 points to the defeated team.
  • 1 point each if the game finished in a draw.

So lets load our data in and load up our tidyverse packages for standard data manipulation. We will use the result column to assign home points and away points, by creating two new columns using dplyr::mutate() as follows:

library(tidyverse)
load("data/england.rda")
# assign points to results (2 pts for a win up to 1980-81 season then 3 pts for a win afterwards)
england <- england %>% 
dplyr::mutate(
homepts = dplyr::case_when(
Season <= 1980 & result == "H" ~ 2,
Season > 1980 & result == "H" ~ 3,
result == "D" ~ 1,
result == "A" ~ 0
),
awaypts = dplyr::case_when(
Season <= 1980 & result == "A" ~ 2,
Season > 1980 & result == "A" ~ 3,
result == "D" ~ 1,
result == "H" ~ 0
)
)

Now we have our points allocated for each game, we will need to get the total home and away points for each team and each Season, remembering at this point that we are only interested in games played in the top tier:

# restrict to Tier 1 and assemble into total points per season
home_pts <- england %>%
dplyr::filter(tier == 1) %>%
dplyr::group_by(Season, home) %>%
dplyr::summarize(pts = sum(homepts))
away_pts <- england %>%
dplyr::filter(tier == 1) %>%
dplyr::group_by(Season, visitor) %>%
dplyr::summarize(pts = sum(awaypts))

Now we can bind these two dataframes together and add up home and away points to get total points for each team by season:

total_pts <- home_pts %>% 
dplyr::rename(Team = home) %>%
dplyr::bind_rows(
away_pts %>%
dplyr::rename(Team = visitor)
) %>%
dplyr::group_by(Season, Team) %>%
dplyr::summarise(pts = sum(pts))

Now we have a dataframe which shows the points earned by each team in each season. We want to manipulate this so that each season shows us the sum of all points earned up to and including that season. We can achieve this with a quick for loop:

# create rolling sums
table <- total_pts %>% 
dplyr::filter(Season == 1888) %>%
dplyr::select(Season, Team, Points = pts)
for (i in 1889:2017) {
table <- total_pts %>%
dplyr::filter(Season <= i) %>%
dplyr::group_by(Team) %>%
dplyr::summarise(Points = sum(pts, na.rm = TRUE)) %>%
dplyr::mutate(Season = i) %>%
dplyr::bind_rows(table)
}

We have done enough to have the data we need for objective 1. For objective 2, I will edit the Season column to sprinkle in some important historical facts about the English football league. I will keep each fact displayed for around three seasons to allow it to appear long enough in the animation for viewers to read it. Let’s call this new edited column SeasonLabel.

# add some historic facts to seasons
table <- table %>% 
dplyr::mutate(
SeasonLabel = dplyr::case_when(
Season <= 1891 ~ paste(Season, "Football League is formed with 12 teams in 1888", sep = " - "),
dplyr::between(Season, 1892, 1895) ~ paste(Season, "Second Division introduced in 1892", sep = " - "),
dplyr::between(Season, 1914, 1918) ~ paste(Season, "League suspended during World War I", sep = " - "),
dplyr::between(Season, 1920, 1924) ~ paste(Season, "Third Division North/South introduced in 1920/21", sep = " - "),
dplyr::between(Season, 1925, 1928) ~ paste(Season, "New Offside Law introduced in 1925", sep = " - "),
dplyr::between(Season, 1939, 1945) ~ paste(Season, "League suspended during World War II", sep = " - "),
dplyr::between(Season, 1958, 1961) ~ paste(Season, "Regional Third Divisions amalgamated in 1958 to form Nationwide Third and Fourth Divisions", sep = " - "),
dplyr::between(Season, 1965, 1968) ~ paste(Season, "Substitutes first allowed in 1965", sep = " - "),
dplyr::between(Season, 1974, 1977) ~ paste(Season, "First match played on a Sunday in 1974", sep = " - "),
dplyr::between(Season, 1981, 1984) ~ paste(Season, "Three points for a win introduced in 1981", sep = " - "),
dplyr::between(Season, 1986, 1989) ~ paste(Season, "Play-offs introduced to decide some promotions", sep = " - "),
dplyr::between(Season, 1992, 1995) ~ paste(Season, "Premier League formed in 1992, reducing Football League to three divisions", sep = " - "),
dplyr::between(Season, 2004, 2007) ~ paste(Season, "Football League renames divisions in 2004 to Championship, League One and League Two", sep = " - "),
dplyr::between(Season, 2013, 2016) ~ paste(Season, "Goal Line Technology introduced in Premier League in 2013", sep = " - "),
1L == 1L ~ as.character(Season)
)
)

Lets take a look at our data now:

Now we have everything we need to code our animation. Let just save our dataset using save(table, 'data/table.RData') so we can open it up in a new file that we will use to create the animation.

Creating the animation

With the help of this StackOverflow discussion, we will use theggplot2 package to design a static bar chart for this data, and then we will use the awesome package gganimate to generate a rolling animation that transitions through each SeasonLabel and updates the bar chart.

Let’s load the data and some packages we will need. Then, first, we need to rank each team in each season, as the rank will determine the order that the bars appear in the ‘race’ chart. We also create a relative value for each team compared to the team at the top of the ranking, as this will help with scaling the bars. Finally we create a label to draw the value from the Points column. We then restrict to the top 10 teams by rank.

library(tidyverse)
library(ggplot2)
library(gganimate)
library(gifski)
ggplot2::theme_set(theme_classic())
load("data/table.Rdata")
# generate top n ranking by year group
anim_table <- table %>%
dplyr::group_by(Season) %>%
dplyr::mutate(
rank = min_rank(-Points) * 1,
Value_rel = Points / Points[rank == 1],
Value_lbl = paste0(" ", Points)
) %>%
dplyr::filter(rank <= 10) %>%
dplyr::ungroup()

Now we have everything we need to plot the static bar chart— this only needs a string of fairly basic ggplot2 commands. I won’t go through these in detail but if you need to refamiliarize yourself with these then I recommend tidyverse.org as a starting point. The main points here are that we use rank as the x aesthetic, Points as the y aesthetic, we then assign Team and Points and then we flip the chart to make it horizontal rather than vertical.

# create static bar chart
p <- ggplot2::ggplot(anim_table, aes(rank)) +
ggplot2::geom_tile(aes(
y = Points / 2,
height = Points,
width = 0.9,
fill = "blue"
), alpha = 0.8, color = NA) +
ggplot2::geom_text(aes(y = 0, label = paste(Team, " ")), size = 12, vjust = 0.2, hjust = 1) +
ggplot2::geom_text(aes(y = Points, label = Value_lbl, hjust = 0)) +
ggplot2::coord_flip(clip = "off", expand = FALSE) +
ggplot2::scale_y_continuous(labels = scales::comma) +
ggplot2::scale_x_reverse() +
ggplot2::guides(color = FALSE, fill = FALSE)

Now comes the main part of the animation. We give labels to the static plot for the title and the axes, but one of these labels is the one that we will use to animate the plot — ie SeasonLabel. This is known as the transition variable. So we tell ggplot2 that we want the title to print the current transition state — ie SeasonLabel as the title as it rotates through the animation. Finally we use ease_aes() to define the way values change as we move through the transitions — there are many easing functions which you can experiment with here, just consult the help file for details.

# set SeasonLabel as transition state and set to animate
p <- ggplot2::labs(p,
title = "{closest_state}", x = "", y = "Total Points",
caption = "Source: Github(jalapic/engsoccerdata) | Top tier points only, does not include mandatory points deductions | Plot generated by @dr_keithmcnulty"
) +
ggplot2::theme(
plot.title = element_text(color = "darkblue", face = "bold", hjust = 0, size = 30),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
plot.margin = margin(2, 2, 1, 16, "cm")
) +
gganimate::transition_states(SeasonLabel, transition_length = 4, state_length = 1) +
gganimate::ease_aes("cubic-in-out")

So our animated plot is now saved in the environment as the object p. All we need to do now is render it as a output file. There are many devices that can be used to render the output in various formats, but the most common are:

  • gif output — which will produce an animated image file. This requires the gifski package installed.
  • mp4 video output — this requires ffmpeg to be installed on your machine.

In my case I am going to create a gif as per the image displayed earlier. You can adjust the speed of the gif by using the duration argument, and you can tailor the size and resolution of the image too.

# save as preferred rendered format
gganimate::animate(p, nframes = 200, fps = 5, duration = 100, width = 2000, height = 1200, renderer = gifski_renderer("anim.gif"))

Depending on how you set the arguments in the animate() function, the file may take longer to render, but you can view progress in your console.

Further study

This was a short practical case study. It only covers a very small amount of the animation possibilities in R. I would encourage you to read up on gganimate more and then potentially try to use it to animate some of your own data. Time series data in particular lends itself well to good animation. The plotly package in R also has increasing animation capabilities which are worth a look.

Leave a Reply

%d bloggers like this: