Five behaviors of great Data Science coders

In the new data science era, everyone is jumping onto the bandwagon. Recruiters are receiving resumes and CVs with all the lingo in them: R, Python, Javascript, whatever. In many cases, people are putting a skill on their resume on the basis of a one week session at college or a couple of Datacamp courses.

The best way to determine someone’s skills in a particular programming language is to set them a task and see how they do. You can’t just trust word of mouth on these things, and you really don’t want to have someone in your team or organization who is not up to scratch on the critical skills you need.

Setting practical coding exercises offers two advantages. First, you can test whether the individual knows how to approach and solve the problem at hand and can act somewhat independently in their work. Second, you can identify coding behaviors which are indicative of highly skilled coders.

Here are five such behaviors for R coders in particular, many of which I believe apply across the various other programming languages:

Commenting

No matter what the language, great coders comment well and frequently. It shows a concern for reproducibility, and it likely indicates that they know from experience how important it is to comment.

Where and how often to comment is a matter of judgment, and depends on the complexity of the task, but great coders comment at a reasonably detailed level. For example:

# load libraries
library(dplyr)
#' Function to search starwars names by first letter of name
#'
#' @param x Character value to search as first letter
#'
#' @return a vector of names
first_letter_char_search <- function(x) {
starwars %>%
dplyr::mutate(first_letter = substr(name, 1, 1)) %>%
dplyr::filter(first_letter == x) %>%
dplyr::select(name)
}

Formatting

Code formatting is so critical to easily reading and understanding work. Properly formatted code shows that the coder has taken the time and effort to ensure that his or her code is as readable as possible, which again affects reproducibility and future collaboration.

Good formatting also shows that the coder is aware of the formatting conventions for their language, which is usually a sign that they have a decent amount of experience in it. You’ll see some typical R formatting conventions in the code block above (although depending on the device you are viewing this article on this may not render as I’d hope). Spacing and indenting are among the most fundamental aspects of strong formatting.

Namespacing

This depends on the language being used, but in languages like R, namespacing shows concern for the environment that the coder is operating in. To be clear, namespacing means calling both the function and its package in your code, rather than just the function. For example, calling lubridate::ymd() instead of just ymd(). Again, in the code block above, you will see appropriate namspacing of functions.

Namespacing has two advantages. First, it prevents issues where two functions from different packages have the same name. If you load two packages and they both have a function name in common, R will always use the function from the most recently loaded package unless you namespace your function. Second, it helps another user work out which packages they need to install if they are looking at a snippet of your code.

Code efficiency

Great coders will have a concern for the efficiency of their code and their processes. This can take a number of forms:

  • Code that is repetitive can be abstracted so that a single function can perform the repetitive work. See my article here for more information.
  • Code that has a high level of computational intensity can be reconstructed to reduce unwanted duplication or inefficient manipulations.
  • Code that works on database objects can manipulate the data on server as much as possible to avoid bringing large amounts of data onto local machines — see my other article here.

This can be a massive differentiator in identifying great coders. I have had code submitted to me that is ridiculously and pointlessly inefficient — where its taken hundreds of lines to do what was possible in a handful of lines. Equally, I have seen code that is so elegant in its abstraction and efficiency that you just have to sit back and admire a master of their craft in action.

Clean up

Great coders tidy up after themselves. They don’t leave stuff hanging in their environment that could conflict with their next task or result in security vulnerabilities. The most common issue I have seen is where people neglect to disconnect from databases after they have fetched what they need. I love to see nice tidy code blocks where the coder has said hello and said goodbye. It’s great to be polite, even with databases.


If you are sourcing or hiring programmers, consider how you can set practical exercises to test their skills and see if you can identify some of these positive behaviors. I really recommend it.

Leave a Reply

%d bloggers like this: