What smoking did for people analytics

Anyone who is, like me, a fan of the US TV show Mad Men, will recall that a substantial thread of the plot and historical context of that outstanding period drama revolves around smoking. In the 1960s, the US medical establishment were pivoting towards a point of view that smoking was a cancer-causing habit — a hard thing for them to face up to since they pretty much all smoked themselves.

Today, massively fewer numbers of people smoke habitually or at all. We are certainly not out of the woods, but 50 years has made a substantial difference. However, 50 years is a long time — and it’s natural to ask why it takes so long to create change.

One answer is, of course, that smoking is a terribly addictive behavior — and addictive behaviors are the hardest things to change. Another reason though, is that it takes a long time to build up the evidence that smoking correlates with negative survival outcomes, and even longer to show that smoking causes cancer.

Those of us who work in the analysis of people, however, had something to gain from the research around smoking and its health outcomes. It was this research that brought methods of epidemiological analytics into the fore that offer incredible value to us today in the study of people and organizations. It was during the 1960s to 1980s, when the medical establishment took on the giants of tobacco in a long drawn out slug fest, that survival analysis stepped up to land a knockout punch.

Survival analysis

Survival is the most important outcome in medical science, so it is not surprising that a field of statistics was founded that was focused on better understanding drivers of survival. However, before the mid-20th century, much survival research focused on acute disease — bacterial or viral infections that seemed to kill some within days or weeks but saw others recover quickly or survive unscathed. The deadly Spanish Flu pandemic of 1918–20, which killed somewhere between 50 and 100 million people worldwide, was a big driver of early epidemiological research.

By the 1950s, antibiotics had come on the scene and acute disease was suddenly much less threatening. Attention turned to a different type of epidemic. One which was widespread but not acute. It killed some but not others, but at different stages of life and usually many years or decades further down the line. Cancer created a new challenge for epidemiologists in their study of survival. Longitudinal survival tracking became necessary.

So the medical research establishment started to gear up to track people over decades, not weeks and months. And not just those who were sick — the healthy had to be tracked too to get an understanding of which lifestyle factors resulted in greater incidences of survival-threatening diseases like cancer. It heralded a breakthrough in methods, systems and processes that resulted in some of the huge longitudinal studies that we see reported today in the media.

But it also resulted in new ways to analyze and represent survival — all of which are supremely useful today in the study of people more broadly.

Survival Curves and Hazard Ratios

Imagine that you have a hypothesis that a certain element of an individual’s experience in a group or organization is an indicator of their likelihood of continued membership of that group or organization over time. For example, you might believe that people who work in a certain department have such a positive experience that they develop a long term attachment to the company. Or vice versa, the experience is so poor that they start to look out into the job market again.

The hypothesized experience could be regarded as a ‘lifestyle factor’, and you could analyze over time the likelihood of attrition in the same way as you would analyze survival in the study of diseases like cancer. One way to do this could be to take point in time samples of groups of people and whether or not they have been exposed to the experience that is of interest, and then track them over the following months or years to see if a causative relationship between the experience and attrition might be suggested.

Kaplan-Meier survival curves are a really intuitive way of representing this graphically. Going back to our smoking example, the graph above shows survival curves for smokers vs non-smokers in a particular medical study. The x axis shows the months after a certain starting point of measurement when the individuals were classified according to their smoking status, and the y axis shows the proportion of individuals that were still alive at each time point. Note that the starting point does not have to be the same for each individual. Providing there is no bias inherent in the time difference, people can join a study at any time t and the curve tracks them to t + 120.

A similar useful measure, particularly for executive summary conclusions or abstracts, is a Hazard Ratio — which calculates the average likelihood of survival for a particular group over a specified time period as a proportion of a baseline population. For example you can calculate the survival likelihood of women over a 2-year period compared to the general population. Or in the workplace, you can calculate the attrition likelihood or high performers compared to the general employee base. Accurate calculation of hazard ratios gives you the power to validly state conclusions like ‘high performers are 20% more likely to leave us within a two year period’.

Applications to People Analytics

I expect that many of you are seeing the parallels here, but here are a few ways in which survival analysis can be applied in Human Capital contexts:

  • Survey validation: Survival analysis can be used to show that survey responses should be taken seriously. For example, if people who give non-neutral ratings to certain survey items can be shown to have a higher or lower likelihood of attrition, this can help management sit up and take notice of these survey responses in the future.
  • Predictive analytics: Survival analysis can establish the validity of a specific measure in predicting attrition or other outcomes of interest, either for use in and of itself or as a valid feature in a broader predictive model. For example, research at Stanford GSB showed that language in emails was a valid indicator of an employee’s cultural fit with an organization using the (inverted) survival curve above. At McKinsey, we used survival curves to show that at any given point in time, the number of meaningful connections that someone has in an organization can be a predictor of their likelihood to remain.
  • Promoting diversity or diverse experiences: Survival analysis doesn’t just apply to attrition, but can be applied to any outcome of interest. For example, if you want to illustrate an increased propensity of an organization to utilize certain ‘types’ of individuals for certain tasks or types of work, Kaplan-Meier curves or hazard ratios can be a great way to illustrate this and determine if the hypothesized effect is statistically defensible.

Survival analysis is a very powerful tool in the study of people outcomes, and it is often one where the data needed is quite simple (often no more than some survey responses or participation records and some departure dates). More organizations should be using survival analysis to keep them honest about what is really driving their people outcomes.

Leave a Reply

%d bloggers like this: