The more coding I do, the more sensitive I become to inefficiency. For me, Nirvana is where you can code super quickly without having to do stuff outside your favorite code editor.
So I get pretty frustrated when I can’t do what I want using lean, efficient code, or when I have to go into my filesystem or another program to configure something, or anything else that I view as taking up unnecessary time and effort.
Here are ten hacks I use regularly to try to minimize distractions and keep up my production pace. When I tell some people about these, I often get a few reactions like ‘Why didn’t I know about this?’. So I hope at least some of these will be new and useful to you.
1. Downloading and reading files straight from source
This tip should help you minimize time administering local data files and make your entire project more replicable by others. If you have a data file which is sitting on the web somewhere, like in Google Drive or some other URL, the
readr package allows you to read it direct from the URL into a dataframe, using functions like
read_rds(). For example:
my_df <- readr::read_csv(url("https://www.website.com/data.csv"))
If the data is in a Github repo, you can get the download the raw file by adding
?raw=true to the URL, so here’s how you would get some data on speed dating from one of my Github repos:
speed_dating_data <- readr::read_rds(url("https://github.com/keithmcnulty/speed_dating/blob/master/speed_data_data.RDS?raw=true"))
If you have a weird file type that
readr can’t process, you can simply use base R’s
download.file() function to get it into your session where you can then read it using whatever the right package is. You don’t need the url function for this, so to use this to download my speed-dating data:
download.file("https://github.com/keithmcnulty/speed_dating/blob/master/speed_data_data.RDS?raw=true", destfile = "speed_dating_data.RDS") speed_dating_data <- readRDS("speed_dating_data.RDS")
The advantage of this is that others can now run your project without needing to worry about having the data available locally.
For hosting services that require authentication, like Box for example, you’ll find an increasing number of R packages that handle the authentication and the file downloads, so if you are doing a lot of manual downloading from these services you are probably wasting unnecessary time.
boxr is a great package if you work in Box. It stores your credentials so that you don’t have to authenticate for every transaction, and like
readr it has built in read functions so that you can download your data and read it into a dataframe in a single command.
2. Storing your credentials for regular use
If you are frequently typing in passwords to databases or other services, then you are wasting time, and if they are exposed in your code then that’s just bad practice. R has a hidden file called
.Renviron where you can store credentials into your global environment so that you don’t have to keep typing them in or exposing them.
The best way to work with
.Renviron is to use the
edit_r_environ() function. This will immediately open up your
.Renviron file in your session so you can add environment variables. As an example, lets say you have a database password you use frequently. In
.Renviron would enter this in a new line:
Then in your code you just need to enter this in place of your password:
Remember that whenever you save a new environment variable into
.Renviron you need to restart R for it to take effect. You can use this for all sorts of stuff like database logins, API credentials, blog credentials, whatever.
3. RStudio’s shortcut keys
So many people do not know about the shortcut keys in RStudio. They can save so much coding time.
One biggie is Ctrl-Shift-M for the pipe function
%>% . I always get at least one ‘how did I not know about this?’ when I tell people about this. Another is Option-
- (or Alt-
-) for assignment
<-. (I do hope you are not using
= for this!).
If you are working with a lot of code in different files, try Ctrl-
. to open up a search window for a file or function, type in your file or function name, and it will take you straight to where that is in your project. You have no idea how much time this saves me.
You can find a full list of RStudio keyboard shortcuts here. It’s like a bag of treats for programmers.
4. Global chunk options in RMarkdown
If I am writing in RMarkdown, I often find that whatever chunk options I have decided on for my document, they will be the same for all chunks. So in one document I might want to echo my code, or I might not want warnings or messages displayed.
Instead of having to type those options into every chunk in the document, I can set them as global chunk options in my first code chunk. Here’s an example of how to do this:
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
5. Easy pasting of ggplots with the
If you are putting multiple ggplots together, the
patchwork package uses an intuitive and simple grammar so that you don’t have to use more complicated functions like
grid.arrange(). It also has more capabilities than
cowplot to handle complex layouts.
With each plot assigned to an object, you can use characters like
/ to specify what you want aligned in columns and and what you want in rows, and the package will do the alignment for you. Here’s an example from the package Github repo using
library(ggplot2) library(patchwork) p1 <- ggplot(mtcars) + geom_point(aes(mpg, disp)) p2 <- ggplot(mtcars) + geom_boxplot(aes(gear, disp, group = gear)) p3 <- ggplot(mtcars) + geom_smooth(aes(disp, qsec)) p4 <- ggplot(mtcars) + geom_bar(aes(carb)) (p1 | p2 | p3) / p4
6. Smoother dependency management using
It’s a pretty common experience for an R programmer (and programmers in many other languages) to have picked up a project from someone else, and when they try to get it working they realize they have to spend ages working out which packages to use, solving version control issues and doing other forms of dependency management.
Renv package provides a much simpler and cleaner way of solving this problem compared to
packrat, a previous package dependency management solution.
When you’ve developed your project to a certain point and you think you or someone else might have to come back to it again in the future, by running
renv::init() you can get all your package and version dependencies discovered and stored in a file called
renv.lock, which you should keep in the project repo. Then at a later point, by running
renv::restore() you’ll be able to reinstall all dependencies from
renv.lock to ensure that you have set up your project as close as possible to where you left it, and massively reducing the chances of running into problems associated with package versions. So simple, so important and so efficient.
7. Multitask with RStudio’s Jobs
You’ve probably heard that R is single-threaded and this this can cause problems. It’s true that if you execute a computationally-intensive command in your console you can end up locking up your R Session until the command has been completed.
But for most everyday R programmers, probably the most useful development was the introduction of Jobs in RStudio 1.2.
If you have a lengthy R script that you need to execute completely but you still want to do other stuff in your console, you’ll see that from V1.2 of RStudio, the Source button now has a pulldown menu where you can source a script as a ‘Local Job’. This will basically run your script in a different R session, allowing you to work away in your existing session.
Bear in mind that once the job has finished, all objects it creates will be gone, so if you haven’t written what you need to a database or file system, you’ve lost it. If you want to explore the objects that are created during a local job, you’ll have have to save an image of the session’s workspace to an
Rdata file at the end of your script using the
While the script is running you can monitor its progress using green bars displayed in the Jobs window, and if you comment your script with code sections (comments that have at least three trailing dashes), the Job’s window will display the comments as it moves through the script so you can see exactly where it’s at.
(If you work on RStudio server you can also use Lancher Jobs to run your script on a compute infrastructure if you have access to one – very handy!)
More on Jobs here.
8. Rename all variables in scope
This probably fits in an earlier section on RStudio Keyboard shortcuts but I think it deserves a section of its own. So you decide to rename one of your objects and then you realize that this object appears like twenty other times in your code and so you have to go and edit those twenty instances also – sound familiar?
Try highlighting the object name and then Ctrl-Alt-Shift-M (or Ctrl-Option-Shift-M). You’ll immediately see that any changes you now make to your object will be changed everywhere that object appears in your current scope. You’re welcome 😊
. to keep piping
With the extensive use of piping in R code now, and the fact that most tidyverse functions receive the object as their first argument, people seem to have forgotten about how to use
. in their R code. For example, I often see coders stop piping as soon as they come to a function that doesn’t take the object as its first argument. They move to a new line of code and break the pipe.
For those who have forgotten about
., or never learned about it in the first place, you can use
. to represent the place where you want the previous output to be piped into your function. This allows you to pipe functions that do not take objects as their first argument. Here’s an example:
vec <- c("hello", "jello", "is", "great") vec %>% grepl("lo", .)  TRUE TRUE FALSE FALSE
10. Immediately invoked display
I’ve seen quite a number of R programmers assign something to a named object and then type the name of that object to then immediately view it. You can avoid that extra step. If you want to both assign something to an object and view it at the same time, just wrap your code in brackets. This can be handy in debugging when you want to look through a series of steps in your code to try to spot exactly where it didn’t do what you expected it to do. I find it also prevents me having to pointlessly type object names to have them display in my RMarkdown documents. Here’s what I mean:
## What I usually see mt_count <- mtcars %>% dplyr::count(cyl) mt_count # A tibble: 3 x 2 cyl n <dbl> <int> 1 4 11 2 6 7 3 8 14 ## Immediately invoked display (mt_count <- mtcars %>% dplyr::count(cyl)) # A tibble: 3 x 2 cyl n <dbl> <int> 1 4 11 2 6 7 3 8 14
Do you have any time saving R hacks? Feel free to add them in the comments.