Create Hans Rosling’s famous animated bubble chart in a single piped R command

A few weeks ago I sat down with an hour or so to spare and decided that I’d like to try create Hans Rosling’s Gapminder bubble chart — made famous by his hugely entertaining lectures and TED talks — from scratch in R.

I had a few criteria in mind as I proceeded:

  1. Can I do it using economic data straight from the World Bank source, without needing to use local data files or package data which may not be up to date?
  2. Can I make it look reasonably close to how Rosling’s chart looked?
  3. Is it possible do to it in a single piped command in R?

I was able to fulfil all my criteria, and having done so I realized that this was a great little learning exercise for those who wish to work more in animated graphics, and also made me aware of the wbstats package in R which I had never known about before. So I’m sharing it here.

If you can’t be bothered reading this, you can go straight to the code on Github.

As a precautionary note, this isn’t the shortest way of doing it. You can do it in fewer characters, but this will mean not namespacing functions (which I always try to do for the benefit of code readers), or pulling data from intermediate sources. But I think this is a nice, independent and future-proof approach which is clean and efficient.

Setting up and grabbing data

You’ll need the following R packages for this: tidyverse, ggplot2, gganimate, viridis to help with color, and wbstats to get the data from the World Bank. If you want it to look exactly like mine, you may also need to install the Oswald font family from Google fonts, but this is not critical.

It’s always good to have a plan for what you want your final product to look like. Here was my plan:

  1. X axis: Log GDP per capita (Using Log helps spread what is otherwise quite asymptotic data)
  2. Y axis: Life expectancy at birth
  3. Bubble size proportional to population
  4. Bubble colour coded by economic region as per the World Bank regions
  5. Transition year by year from 1960 to the latest available data.

This means I needed to obtain three types of data from the World Bank: GDP per capita, Life expectancy at birth and Population.

The wbstats package is an awesome utility that allows you to plug directly into the World Bank database using its API and download data directly to your R session. You can visit the World Bank Open Data site here and browse for the indicators you want. Once you find one, you just need to make a note of its indicator ID code. For example, if you search for “GDP per Capita (current US$)”, you should be taken to this page — then click on the “Details” icon and you will see the ID. In this case it is NY.GDP.PCAP.CD.

Using these indicator ID codes you can use the wbstats package to instantly grab the data for the three indicators you need using our first command:

indicator = c("SP.DYN.LE00.IN", "NY.GDP.PCAP.CD", "SP.POP.TOTL"),
country = "countries_only",
startdate = 1960,
enddate = 2018

The country = "countries_only" argument is important — the World Bank data also includes regional and worldwide averages, which you don’t want for this analysis.

This is almost all we need in terms of data. But for our color coding we need to assign our countries to World Bank regions. wbstats has a handy function called wbcountries() where you can select the iso3c country code and its region and join to the previous table to assign countries to regions, as follows:

indicator = c("SP.DYN.LE00.IN", "NY.GDP.PCAP.CD", "SP.POP.TOTL"),
country = "countries_only",
startdate = 1960,
enddate = 2018
) %>%
dplyr::left_join(wbstats::wbcountries() %>%
dplyr::select(iso3c, region))

This should give you a dataframe that looks like this:

Our final data prep step is to spread the three indictors across the row for each country and year, as right now they are all in long form down the indicatorID column. We will use the newly-minted pivot_wider() function in tidyr for this (make sure you have the latest version!). So keeping date, country and region fixed, we want to widen according to the indicator column, filling with values from the value column:

indicator = c("SP.DYN.LE00.IN", "NY.GDP.PCAP.CD", "SP.POP.TOTL"),
country = "countries_only",
startdate = 1960,
enddate = 2018
) %>%
dplyr::left_join(wbstats::wbcountries() %>%
dplyr::select(iso3c, region)) %>%
id_cols = c("date", "country", "region"),
names_from = indicator,
values_from = value

This should produce the data in this form, which is now ready for graphing:

Creating a static chart for a single year

Since animation is simple movement between static charts, the majority of our graphic work will be to create the static styled chart for a single year using ggplot2.

Graphing in ggplot2 is highly intuitive: here are the things I want to do:

  1. Define my aesthetics: x is Log GDP per capita, y is life expectancy, size is population and color is region usingggplot()
  2. Set it up as a scatter with an alpha of 0.5 to ensure some element of a translucency to the bubbles usinggeom_point()
  3. Scale the bubble sizes to ensure that they are reasonable for the look and feel of the chart using scale_size()
  4. Set boundaries for the axes using scale_x_continuous() and scale_y_continuous()
  5. Set a nice bubble colour scheme using viridis::scale_color_viridis()
  6. Set the axis labels using labs()
  7. Give the chart a nice clean theme using theme_classic()
  8. Finally, as per the Rosling original, we will want the date in a light grey in the center background so as not to interfere with the visual too much — you can set this with geom_text()

So here is the code to do this all this — you just need to pipe the data into this code to create the static versions of the chart:

ggplot2::ggplot(aes(x = log(`GDP per capita (current US$)`), 
y = `Life expectancy at birth, total (years)`,
size = `Population, total`,
color = region)) +
ggplot2::geom_point(alpha = 0.5) +
ggplot2::scale_size(range = c(.1, 16), guide = FALSE) +
ggplot2::scale_x_continuous(limits = c(2.5, 12.5)) +
ggplot2::scale_y_continuous(limits = c(30, 90)) +
discrete = TRUE, name = "Region", option = "viridis") +
ggplot2::labs(x = "Log GDP per capita",
y = "Life expectancy at birth") +
ggplot2::theme_classic() +
ggplot2::geom_text(aes(x = 7.5, y = 60, label = date),
size = 14, color = 'lightgrey',
family = 'Oswald')

Animating the chart

And now for the last and easiest part of the work. Now that you have the static animation set up, you just need to use the package gganimate to animate it. All gganimate needs to know is what the transition variable is (in this case it is the date column), and a few details on timing and style of transition. You can achieve this by adding this simple animation command to your code above:

transition_length = 1,
state_length = 1) +

Et voila (here I have taken the color legend out for better effect — you can do this by adding the argument guide = FALSE to the scale_color_viridis() command above):

You can find the complete code here (it’s hard to keep things well-formatted here on Medium). If you run the code you should see the animation directly in your viewer. if you want to save it, you can usegganimate::anim_save() to save it as a gif.

This is just a start — feel free to grab the code and add addition features like naming countries of interest.

Leave a Reply

%d bloggers like this: