The king that graph theory discovered

This is the fourth in a series of articles explaining the principles of networks for those who may use them in a data science context. The first article, which focuses on the origins of graph theory and the basic properties of graphs, can be found here.

One morning in August 2012 in a car park in Leicester in the English midlands, a mechanical digger was starting to cut into the concrete surface. A number of interested spectators were present, hoping against hope that something amazing but highly improbable might occur.

Some rugged detective work had led a group of amateur and professional historians to believe that this may be the area where Richard III — the stooped, crooked-backed English king who met a brutal end at the Battle of Bosworth Field in 1485 — had been unceremoniously dumped over 500 years ago. Bolstered by donations and crowdfunding, they raised enough money and gained approval for a limited archaeological dig in the car park of the local council building. They were excited beyond measure. Everyone else, including the academic and professional archaeologists present, were sceptical.

The first cut was made in an area of the car park marked with a mysterious-looking letter R. Presumably this indicated that the parking space was reserved for someone important, but in truth, nobody could explain why that R was there and what it meant.

Within hours, immediately below that letter R, they found a skeleton with a crooked back.

It was to be the beginning of one of the most amazing archaeological discoveries of modern times. But how could they prove that this skeleton, which clearly showed a lifetime suffering from the debilitating spinal condition scoliosis — something long associated with historical descriptions of King Richard III — was without a doubt King Richard III?

Subsequent analysis of the skeleton showed it to date from the late 1400s and to have enjoyed a rich diet of meat and fish, all of which increased the chance that this was the King. But they needed one more thing to complete the proof: DNA.

To use DNA to prove that this was Richard III was going to be a major challenge. Only mitochondrial DNA or mtDNA, which is passed down through maternal lines in families, remains unchanged from generation to generation. Therefore in order to find proof, a living descendent of the sister of Richard III who could only be traced through a 500 year old line of females needed to be found. Talk about a needle in a haystack!

That needle turned out to be Michael Ibsen, a Canadian who was working as a carpenter and furniture restorer in North London. A swab was taken from Michael’s mouth and the mtDNA was extracted and compared to that extracted from the skeleton. It was a perfect match.

On March 26th 2015 the exhumed body of King Richard III was taken to Leicester Cathedral where where it was re-interred with full Church of England rites and with great ceremony…over 500 years after he died!

Graph databases focus on the properties of nodes and the relationships between them

Graph databases

Michael Ibsen’s royal ancestry, and countless other mysteries of family lineage, are now greatly more discoverable because of the emergence of graph databases.

Although they have existed in highly specialized circles since the 1960s, graph databases started entering the enterprise technology space around 2005. Unlike more common relational databases, which store information in linked tables, the principles behind graph databases are exactly the same as in mathematical graph theory (see previous articles in this series for a briefing on graph theory). In fields related to the study of people, graph databases can be particularly useful.

Each ‘vertex’ or ‘node’ can store information about a person or object. Each ‘edge’ stores information about the relationships between the nodes it connects. This allows much more flexible query. By querying nodes and edges you can answer questions like ‘show me the all people who have published papers with this person’. On the Mathematics Genealogy Project website, which queries against a graph database, I can trace my mathematics lineage to such giants as Dirichlet, Poisson and Euler using edges. I can also find out about mine or their PhD thesis titles by querying the nodes.

Obviously social media engines like Facebook, Twitter and LinkedIn are massively enabled by graph databases. Edges connect you to your followers or connections, and to your likes. When Facebook recently introduced a wider range of reactions (eg anger, love), they basically just had to add these to the edge properties of their graph database.

Genealogy is also great example of a field that has been massively enabled by graph databases. stores biographical information about an individual in its nodes (birth date, marriage date, photos, documents) and it stores relationship data in its edges (mother, father, brother, sister). When you view the family tree of someone, you are querying the graph database of that individual (albeit a particular type of graph called a tree — see here for a briefing on graph types).

Building a graph database

Graph databases are relatively straightforward to set up and configure, and there are some pretty established graph database products on the market right now. Neo4j is the most popular graph database currently, and works for most use cases, but there are plenty of others too. For any use case where relationships are the major focus of the analytics, a graph database will prove superior to traditional relational databases. It will process queries quicker and will be more easy to configure and adjust. You can ask more complex questions with more efficient querying.

In most graph databases the query language is highly intuitive and easy to learn if you are familiar with graph theory. Neo4j’s query language, Cypher, uses parentheses to surround nodes, eg (p) and arrows to represent edges/relationships, eg -->. So let’s say you might want to use Neo4j to find your closest friends who can teach you about Neo4j — you might use something like this:

MATCH (you {name:"You"})
MATCH (expert)-[:WORKED_WITH]->(db:Database {name:"Neo4j"})
MATCH path = shortestPath( (you)-[:FRIEND*..5]-(expert) )
RETURN db,expert,path

It makes particular sense to set up a graph database engine when you have existing data which can be reconfigured to represent connections between people or objects. Examples of this may include:

  • Hours worked together on projects from timesheet or finance data
  • Email or calendar connections from email metadata
  • Joint participation in events from logs or attendance records
  • Document collaborations from publishing records
  • Declared connections such as mentorship or coaching relationships
  • Contractual relationships between entities

If you haven’t played with graph databases, try to find an opportunity to test them out. Hands on experience will give you a real sense of their power. If graph databases can find all your 5th cousins in a few seconds, imagine what they could do in a large organization.

In the next and final post of this series on network analytics, I will look at how networks offer incredible opportunities for studying phenomena, and offer up ideas for future research enabled by networks.

One thought on “The king that graph theory discovered

Leave a Reply

%d bloggers like this: