As a data science tool, R is experiencing massive user growth, on its own and relative to many other prevalent languages. Here’s a chart from an impressive StackOverflow blog piece by David Robinson which illustrates this.
One of the key reasons for this is that R has made the leap into the enterprise sector, and many companies and organizations outside of academia now have R in the toolkits of at least some of their analytics professionals. Another reason is increased accessibility of user-friendly training in R, via Datacamp for example. Finally, adoption has been helped by the availability of a large community of R programmers for help and advice via StackOverflow (though newbies often profess fear of posting questions for risk of being intellectually ‘smacked down’).
Perhaps unsurprisingly, this massive growth in adoption has led to a wider range of genuine R skills. The R user base now ranges from one end of the spectrum where individuals have a deep understanding of base R and the underlying intricacies and logic of the language, to the other end where individuals copy and paste code, piecing it together using trial and error without properly understanding what is going on under the hood.
So, for fun more than anything else, I have pieced together ten simple questions that test how well someone knows what they are doing in R. I have chosen these to be as simple as possible, but also to involve some trivia that only a passionate, deep R user might know the answer to. I am making no claim that this is comprehensive, and indeed I would love to have others contribute to this over time (just remember, keep it simple!).
Please do not use these as interview questions. They are not intended to test working R knowledge — but you may find it interesting to get your colleagues or co-workers to test themselves.
Finally, don’t access your R session to try to answer these. That would be cheating, and we R users are an honest bunch, right?
Ten fun questions to test how well you know R
Question 1: x <- vector()
. What is the data type of x
?
Question 2: y <- 2147483640L:2147483648L
. What is the data type of y
?
Question 3: z <- 0/0
. What is the class of z
?
Question 4: If v <- complex(1,1)
, what is the output of c(v, TRUE)
?
Question 5: A homogeneous 1-D and 2-D data structure in R is called an atomic vector and matrix respectively. What is the name for a) a 1-D heterogeneous data structure, b) a 2-D heterogeneous data structure and c) an n-dimensional data structure where n > 2?
Question 6: What is the significance of the terms Trick or Treat and Warm Puppy to R? What is the origin of these terms?
Question 7: What will happen in each of the following cases if the package dplyr
is not installed?
Case 1:
library(dplyr)
mtcars %>% group_by(cyl) %>% summarize(mean_mpg = mean(mpg))
Case 2:
require(dplyr)
mtcars %>% group_by(cyl) %>% summarize(mean_mpg = mean(mpg))
Question 8: a <- c(2, "NA", 3)
. What is the output of sum(is.na(a))
?
Question 9: What is the output ofdata()
?
Question 10: What is the output of round(0.5)
?
Answers
-
x
is logical. This is the default type for an atomic vector. -
y
is a double, despite the use of the integer notationL
iny
. This is because the maximum value for an integer in R is 2147483647. So the last value ofy
is coerced to a double, and consequently since atomic vectors are homogeneous, the entire vector is coerced to a double. -
z
is of the class numeric. - The output is a vector with two elements, both
1 + 0i
. Note that the first argument ofcomplex()
islength.out
indicating the length of the complex vector. Socomplex(1,1)
evaluates to1 + 0i
butcomplex(1,1,1)
evalates to1 + 1i
. Note thatTRUE
will be coerced to a complex type equivalent of1 + 0i
. - a) List; b) Data frame; c) Array
- They are the nicknames of R version releases. They are taken from old Peanuts comic strips.
- In Case 1, the first line will generate an error indicating that there is no such package installed, and execution will stop. In Case 2, the first line will generate a warning, but the second line will still be executed, and will generate an error because it cannot find
%>%
(assumingmagrittr
is not attached). This is a good illustration of the difference betweenlibrary()
andrequire()
.library()
attaches a package, butrequire()
evaluates whether a package has been attached, evaluating toTRUE
if it has been attached and toFALSE
otherwise. Usingrequire()
can make it more difficult to debug your code. - This evaluates to zero. Note that
"NA"
is a character string and not a missing value. - The output is a list of all inbuilt data sets in R.
- The output is zero. R follows the IEC 60559 standard, where .5’s round to the nearest even number.
How did you do?
If you scored 2 or less, you urgently need a tutorial in base R to avoid spending too much time resolving unnecessary errors in your code.
If you scored 3–5, you likely have a similar level of knowledge to most R users.
6–8 is a very good score, you clearly know a lot of the underlying principles and structures of the R programming language.
If you scored 9 or 10, it’s quite possible you are Hadley Wickham’s twin. You probably know a lot of needless R trivia, and you might well be an R pedant. It’s most likely that you are doing a lot of the smacking down on StackOverflow.