How well do you know R?

As a data science tool, R is experiencing massive user growth, on its own and relative to many other prevalent languages. Here’s a chart from an impressive StackOverflow blog piece by David Robinson which illustrates this.

R is experiencing massive growth recently

One of the key reasons for this is that R has made the leap into the enterprise sector, and many companies and organizations outside of academia now have R in the toolkits of at least some of their analytics professionals. Another reason is increased accessibility of user-friendly training in R, via Datacamp for example. Finally, adoption has been helped by the availability of a large community of R programmers for help and advice via StackOverflow (though newbies often profess fear of posting questions for risk of being intellectually ‘smacked down’).

Perhaps unsurprisingly, this massive growth in adoption has led to a wider range of genuine R skills. The R user base now ranges from one end of the spectrum where individuals have a deep understanding of base R and the underlying intricacies and logic of the language, to the other end where individuals copy and paste code, piecing it together using trial and error without properly understanding what is going on under the hood.

So, for fun more than anything else, I have pieced together ten simple questions that test how well someone knows what they are doing in R. I have chosen these to be as simple as possible, but also to involve some trivia that only a passionate, deep R user might know the answer to. I am making no claim that this is comprehensive, and indeed I would love to have others contribute to this over time (just remember, keep it simple!).

Please do not use these as interview questions. They are not intended to test working R knowledge — but you may find it interesting to get your colleagues or co-workers to test themselves.

Finally, don’t access your R session to try to answer these. That would be cheating, and we R users are an honest bunch, right?

Ten fun questions to test how well you know R

Question 1: x <- vector(). What is the data type of x?

Question 2: y <- 2147483640L:2147483648L. What is the data type of y?

Question 3: z <- 0/0. What is the class of z?

Question 4: If v <- complex(1,1), what is the output of c(v, TRUE)?

Question 5: A homogeneous 1-D and 2-D data structure in R is called an atomic vector and matrix respectively. What is the name for a) a 1-D heterogeneous data structure, b) a 2-D heterogeneous data structure and c) an n-dimensional data structure where n > 2?

Question 6: What is the significance of the terms Trick or Treat and Warm Puppy to R? What is the origin of these terms?

Question 7: What will happen in each of the following cases if the package dplyr is not installed?

Case 1:

library(dplyr)
mtcars %>% group_by(cyl) %>% summarize(mean_mpg = mean(mpg))

Case 2:

require(dplyr)
mtcars %>% group_by(cyl) %>% summarize(mean_mpg = mean(mpg))

Question 8: a <- c(2, "NA", 3). What is the output of sum(is.na(a))?

Question 9: What is the output ofdata()?

Question 10: What is the output of round(0.5)?

Ever wondered about those R nicknames?

Answers

  1. x is logical. This is the default type for an atomic vector.
  2. y is a double, despite the use of the integer notationL in y. This is because the maximum value for an integer in R is 2147483647. So the last value of y is coerced to a double, and consequently since atomic vectors are homogeneous, the entire vector is coerced to a double.
  3. z is of the class numeric.
  4. The output is a vector with two elements, both 1 + 0i. Note that the first argument of complex() is length.out indicating the length of the complex vector. So complex(1,1) evaluates to 1 + 0i but complex(1,1,1) evalates to 1 + 1i. Note that TRUE will be coerced to a complex type equivalent of 1 + 0i.
  5. a) List; b) Data frame; c) Array
  6. They are the nicknames of R version releases. They are taken from old Peanuts comic strips.
  7. In Case 1, the first line will generate an error indicating that there is no such package installed, and execution will stop. In Case 2, the first line will generate a warning, but the second line will still be executed, and will generate an error because it cannot find %>% (assuming magrittr is not attached). This is a good illustration of the difference between library() and require(). library() attaches a package, but require() evaluates whether a package has been attached, evaluating to TRUE if it has been attached and to FALSE otherwise. Using require() can make it more difficult to debug your code.
  8. This evaluates to zero. Note that "NA" is a character string and not a missing value.
  9. The output is a list of all inbuilt data sets in R.
  10. The output is zero. R follows the IEC 60559 standard, where .5’s round to the nearest even number.

How did you do?

If you scored 2 or less, you urgently need a tutorial in base R to avoid spending too much time resolving unnecessary errors in your code.

If you scored 3–5, you likely have a similar level of knowledge to most R users.

6–8 is a very good score, you clearly know a lot of the underlying principles and structures of the R programming language.

If you scored 9 or 10, it’s quite possible you are Hadley Wickham’s twin. You probably know a lot of needless R trivia, and you might well be an R pedant. It’s most likely that you are doing a lot of the smacking down on StackOverflow.

Leave a Reply

%d bloggers like this: