I find Sankey diagrams super useful for illustrating flows of people or preferences. The
networkD3 package in R offers a straightforward way to generate these diagrams without needing to know the ins and outs of the actual D3 code.
To show you what I mean, I generated a Sankey diagram to show how the twelve regions of the UK contributed to the overall result of the 2016 Brexit referendum, where voters chose to leave the European Union by 17,410,742 votes to 16,141,241.
If you want to see the fully interactive Sankey diagram for this, you can view the code via an RMarkdown document on RPubs here. Unfortunately only static images can be displayed on Medium.
Getting the data in shape
Very detailed data on the Brexit referendum can be obtained from the UK’s Electoral Commission website. The first step is to get our libraries loaded and to get the data into R. Since the data is very detailed down to the most localized voting centers, we need to aggregate all the Leave and Remain votes to get a total for each region.
## load libraries
# read in EU referendum results dataset
refresults <- read.csv("EU-referendum-result-data.csv")
# aggregate by region
results <- refresults %>%
dplyr::summarise(Remain = sum(Remain), Leave = sum(Leave))
We then need to create two dataframes for use by
networkD3 in its
nodesdataframe which numbers the source nodes (ie the 12 UK regions) and the destination nodes (ie Leave and Remain), starting at zero.
linksdataframe which itemized each flow using a
valuecolumn. For example, the West Midlands region cast 1,755,687 votes for Leave, so in this case the
sourcewould by the node for West Midlands, the
targetwould be the node for Leave and the
valuewould be 1,755,687.
Here is some simple code to build the data in this way:
# format in prep for sankey diagram
results <- tidyr::gather(results, result, vote, -Region)
# create nodes dataframe
regions <- unique(as.character(results$Region))
nodes <- data.frame(node = c(0:13),
name = c(regions, "Leave", "Remain"))
#create links dataframe
results <- merge(results, nodes, by.x = "Region", by.y = "name")
results <- merge(results, nodes, by.x = "result", by.y = "name")
links <- results[ , c("node.x", "node.y", "vote")]
colnames(links) <- c("source", "target", "value")
Now that we have our data constructed the right way, we can simply use the
networkD3::sankeyNetwork() function to create the diagram. This produces a simple, effective diagram, with rollover interactivity displaying the details of each voting flow. The static version is presented here.
# draw sankey network
networkD3::sankeyNetwork(Links = links, Nodes = nodes,
Source = 'source',
Target = 'target',
Value = 'value',
NodeID = 'name',
units = 'votes')