My R code snippets

I keep a list of frequently-used code snippets that I find valuable and, in the spirit of the R community, I thought it best to share them.

Robin Penfold
2019-04-24

In no particular order, here they are …

Begin an R notebook

I use the following code to start an R notebook.


---
title: ""
author: "Robin Penfold"
date: ""
output: 
  html_notebook:
    theme: sandstone
    highlight: pygments
    code_folding: "show"
    toc: true
    toc_float: true
    toc_depth: 4
---

***

### Introduction

Add ...

I also reduce the ouput by adding the following to the first code chunk (and setting message to FALSE):


suppressMessages(library(tidyverse))

Count outcomes

Simple but frequently used.


mtcars %>% 
  count(cyl, gear)

# A tibble: 8 x 3
    cyl  gear     n
  <dbl> <dbl> <int>
1     4     3     1
2     4     4     8
3     4     5     2
4     6     3     2
5     6     4     4
6     6     5     1
7     8     3    12
8     8     5     2

mtcars %>%
  count(cyl, gear) %>%
  count(gear)

# A tibble: 3 x 2
   gear     n
  <dbl> <int>
1     3     3
2     4     2
3     5     3

Mutate in bulk


iris %>% 
  mutate_at(
    .cols = vars(-Species),
    .funs = funs(. * 100)
    ) %>% 
  head()

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          510         350          140          20  setosa
2          490         300          140          20  setosa
3          470         320          130          20  setosa
4          460         310          150          20  setosa
5          500         360          140          20  setosa
6          540         390          170          40  setosa

As with the previous shortcut, I often need to mutate all but a few columns. Running the code directly above multiplies all columns in iris by 100 except for the Species. As with many of these snippets, the example shows the principle, which can be applied in many ways.

Reduce ifelse calls

Taken from Advanced R by Hadley Wickham, this infix function provides a default value in the case where the output of another function is NULL.


library(rlang)

# `%||%` <- function(a, b) if (!is.null(a)) a else b

function_that_might_return_NULL() %||% default_value

Scrape web data

I tend to use the format read_html(index_page) %>% html_nodes("a") %>% html_attr("href").


library(rvest)
library(xml2)

read_html("http://www.theguardian.com/football/premierleague/table") %>%
  html_nodes(".table--striped") %>%
  .[[1]] %>%
  html_table() %>% 
  select(Team:Pts) %>% 
  head()

       Team GP W D L  F  A GD Pts
1 Liverpool  8 8 0 0 20  6 14  24
2  Man City  8 5 1 2 27  9 18  16
3   Arsenal  8 4 3 1 13 11  2  15
4 Leicester  8 4 2 2 14  7  7  14
5   Chelsea  8 4 2 2 18 14  4  14
6  C Palace  8 4 2 2  8  8  0  14

Select some columns

I often need to select all but a few columns that end in a certain way.


iris %>%
  select(
    everything(), 
    -ends_with("Width")
    ) %>%
  head()  

  Sepal.Length Petal.Length Species
1          5.1          1.4  setosa
2          4.9          1.4  setosa
3          4.7          1.3  setosa
4          4.6          1.5  setosa
5          5.0          1.4  setosa
6          5.4          1.7  setosa

Show ggplot2 in Rmd

This code chunk metadata enables a ggplot2 object to appear nicely in Rmarkdown.

{r, eval=TRUE, echo=TRUE, warning=FALSE, message=FALSE, fig.show='hold', fig.width = 6, fig.asp=0.618, out.width="70%", fig.align="center"}

Summarise tidily

Do you ever need to summarise all columns in a dataframe using certain functions. I find this shortcut really helpful for doing so in a tidy way.


mtcars %>% 
  map_df(
    .x = .,
    .f = ~tibble(
      sum = sum(.x),
      iqr = IQR(.x),
      min = min(.x)
      ),
    .id = "var_name"
    )

# A tibble: 11 x 4
   var_name   sum     iqr   min
   <chr>    <dbl>   <dbl> <dbl>
 1 mpg       643.   7.38  10.4 
 2 cyl       198    4      4   
 3 disp     7383. 205.    71.1 
 4 hp       4694   83.5   52   
 5 drat      115.   0.840  2.76
 6 wt        103.   1.03   1.51
 7 qsec      571.   2.01  14.5 
 8 vs         14    1      0   
 9 am         13    1      0   
10 gear      118    1      3   
11 carb       90    2      1   

Translate dplyr to SQL

Even better, you can choose the dialect of SQL!


library(dplyr)
library(dbplyr)

df <- tibble(
  y = c('a', 'b', 'c'), 
  z = c(2, 3, 4)
  )

x <- tbl_lazy(
  df, 
  con = simulate_mssql()
  )

x %>% 
  filter(y != 'a') %>% 
  summarise(x = sd(z, na.rm = TRUE)) %>% 
  show_query()

<SQL>
SELECT STDEV(`z`) AS `x`
FROM `df`
WHERE (`y` != 'a')

Miscellanea