What you need to know about dplyr 1.0.0 – Part 3: Working row-wise

dplyr previously had limited friendliness to working across rows. It previously behaved somewhat counter-intuitively when you wanted to sum or average across values in the same row. Here’s an example, which some of you might recognize as being a source of a previous headache:

WorldPhones_df <- WorldPhones %>% 
  as.data.frame() 

# mutate an average column

WorldPhones_df %>% 
  dplyr::mutate(avg = mean(N.Amer:Mid.Amer))

     N.Amer Europe Asia S.Amer Oceania Africa Mid.Amer   avg
1951  45939  21574 2876   1815    1646     89      555 23247
1956  60423  29990 4708   2568    2366   1411      733 23247
1957  64721  32510 5230   2695    2526   1546      773 23247
1958  68484  35218 6662   2845    2691   1663      836 23247
1959  71799  37598 6856   3000    2868   1769      911 23247
1960  76036  40341 8220   3145    3054   1905     1008 23247
1961  79831  43173 9053   3338    3224   2005     1076 23247

This has returned the average of everything in every column in your dataframe, which is of course not what was intended.

Previously the only solution to this was to use manual calculations and to avoid using functions in this way, so you wuld write (N.Amer + Europe + Asia + S.Amer + Oceania + Africa + Mid.Amer)/7 which was pretty darn tedious.

rowwise() creates a different structure called a rowwise_df which prepares your data to perform operations across the rows – it basically groups your data by row. It can be used in combination with the new c_across() adverb to allow you to work in a similar way to how you would work colwise. Now you can write:

WorldPhones_df %>% 
  rowwise() %>% 
  dplyr::mutate(avg = mean(c_across(N.Amer:Mid.Amer)))

# A tibble: 7 x 8
# Rowwise: 
  N.Amer Europe  Asia S.Amer Oceania Africa Mid.Amer    avg
   <dbl>  <dbl> <dbl>  <dbl>   <dbl>  <dbl>    <dbl>  <dbl>
1  45939  21574  2876   1815    1646     89      555 10642 
2  60423  29990  4708   2568    2366   1411      733 14600.
3  64721  32510  5230   2695    2526   1546      773 15714.
4  68484  35218  6662   2845    2691   1663      836 16914.
5  71799  37598  6856   3000    2868   1769      911 17829.
6  76036  40341  8220   3145    3054   1905     1008 19101.
7  79831  43173  9053   3338    3224   2005     1076 20243.

Working row-wise with list columns

The new rowwise_df object is designed to work with list-columns, which allow the storage of any type of data you want inside a column in a dataframe. Where I find this particularly valuable is where you want to run different models on subsets of your data according to the value of certain variables. Here’s an example of how you can store different subsets of mtcars in a rowwise dataframe and then run a model on them.

model_coefs <- function(formula, data) {
  coefs <- lm(formula, data)$coefficients
  data.frame(coef = names(coefs), value = coefs)
}

mtcars %>% 
  dplyr::group_by(cyl) %>% 
  tidyr::nest() %>% 
  dplyr::rowwise() %>% 
  dplyr::summarise(model_coefs(mpg ~ wt + disp + hp, data = data)) %>% 
  tidyr::pivot_wider(names_from = coef, values_from = value)

# A tibble: 3 x 5
    cyl `(Intercept)`    wt        disp      hp
  <dbl>         <dbl> <dbl>       <dbl>   <dbl>
1     6          30.3 -3.90  0.0161     -0.0110
2     4          44.9 -2.59 -0.0663     -0.0645
3     8          26.7 -2.18  0.00000110 -0.0137

The nest_by function

Of course, the developers behind dplyr 1.0.0 noticed the power of this row-wise modelling capability and so created the nest_by() function as a shortcut for the code above. nest_by(x) is equivalent of:

dplyr::group_by(x) %>% 
  tidy::nest() %>% 
  dplyr::rowwise()

So now you can do the modeling above using:

mtcars %>% 
  nest_by(cyl) %>% 
  dplyr::summarise(model_coefs(mpg ~ wt + disp + hp, data = data)) %>% 
  tidyr::pivot_wider(names_from = coef, values_from = value)

# A tibble: 3 x 5
    cyl `(Intercept)`    wt        disp      hp
  <dbl>         <dbl> <dbl>       <dbl>   <dbl>
1     4          44.9 -2.59 -0.0663     -0.0645
2     6          30.3 -3.90  0.0161     -0.0110
3     8          26.7 -2.18  0.00000110 -0.0137

One thought on “What you need to know about dplyr 1.0.0 – Part 3: Working row-wise

  1. I’ve found the group_map and group_modify verbs more succinct – they avoid any obvious nesting. e.g.

    mtcars %>%
    group_by(cyl) %>%
    group_modify(~ broom::tidy(lm(mpg ~ wt + disp + hp, data = .x)))

Leave a Reply

%d bloggers like this: