My R code snippets

Granted, FUSORC (Frequently Used Snippets Of R Code) isn’t as catchy as FAQ. However, I keep a list of FUSORC that I find valuable and – in the spirit of the R community – thought it best to share.

Here goes …


Begin an R notebook

I use the following code to start an R notebook.

---
title: ""
author: "Robin Penfold"
date: ""
output: 
  html_notebook:
    theme: sandstone
    highlight: pygments
    code_folding: "show"
    toc: true
    toc_float: true
    toc_depth: 4
---

***

### Introduction

Add ...

I also reduce the ouput by adding the following to the first code chunk (and setting message to FALSE):

suppressMessages(library(tidyverse))


Count outcomes

Simple but frequently used.

mtcars %>% 
  count(cyl, gear)
## # A tibble: 8 x 3
##     cyl  gear     n
##   <dbl> <dbl> <int>
## 1     4     3     1
## 2     4     4     8
## 3     4     5     2
## 4     6     3     2
## 5     6     4     4
## 6     6     5     1
## 7     8     3    12
## 8     8     5     2
mtcars %>%
  count(cyl, gear) %>%
  count(gear)
## # A tibble: 3 x 2
##    gear     n
##   <dbl> <int>
## 1     3     3
## 2     4     2
## 3     5     3


Mutate in bulk

iris %>% 
  mutate_at(
    .cols = vars(-Species),
    .funs = funs(. * 100)
    ) %>% 
  head()
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          510         350          140          20  setosa
## 2          490         300          140          20  setosa
## 3          470         320          130          20  setosa
## 4          460         310          150          20  setosa
## 5          500         360          140          20  setosa
## 6          540         390          170          40  setosa

As with the previous shortcut, I often need to mutate all but a few columns. Running the code directly above multiplies all columns in iris by 100 except for the Species. As with many of these snippets, the example shows the principle, which can be applied in many ways.


Reduce ifelse calls

Taken from Advanced R by Hadley Wickham, this infix function provides a default value in the case where the output of another function is NULL.

library(rlang)

# `%||%` <- function(a, b) if (!is.null(a)) a else b

function_that_might_return_NULL() %||% default_value


Scrape web data

I tend to use the format read_html(index_page) %>% html_nodes("a") %>% html_attr("href"). (You also need to be outside a corporate firewall to do this.)

library(rvest)
library(xml2)

read_html("http://www.theguardian.com/football/premierleague/table") %>%
  html_nodes(".table--striped") %>%
  .[[1]] %>%
  html_table() %>% 
  select(Team:Pts) %>% 
  head()
##        Team GP  W D  L  F  A GD Pts
## 1  Man City 36 30 2  4 90 22 68  92
## 2 Liverpool 36 28 7  1 84 20 64  91
## 3     Spurs 36 23 1 12 65 36 29  70
## 4   Chelsea 36 20 8  8 60 39 21  68
## 5   Arsenal 36 20 6 10 69 49 20  66
## 6   Man Utd 36 19 8  9 64 51 13  65


Select some columns

I often need to select all but a few columns that end in a certain way.

iris %>%
  select(
    everything(), 
    -ends_with("Width")
    ) %>%
  head()  
##   Sepal.Length Petal.Length Species
## 1          5.1          1.4  setosa
## 2          4.9          1.4  setosa
## 3          4.7          1.3  setosa
## 4          4.6          1.5  setosa
## 5          5.0          1.4  setosa
## 6          5.4          1.7  setosa


Show ggplot2 in Rmd

This code chunk metadata enables a ggplot2 object to appear nicely in Rmarkdown.

{r, eval=TRUE, echo=TRUE, warning=FALSE, message=FALSE, fig.show='hold', fig.width = 6, fig.asp=0.618, out.width="70%", fig.align="center"}


Summarise tidily

Do you ever need to summarise all columns in a dataframe using certain functions. I find this shortcut really helpful for doing so in a tidy way.

mtcars %>% 
  map_df(
    .x = .,
    .f = ~tibble(
      sum = sum(.x),
      iqr = IQR(.x),
      min = min(.x)
      ),
    .id = "var_name"
    )
## # A tibble: 11 x 4
##    var_name   sum     iqr   min
##    <chr>    <dbl>   <dbl> <dbl>
##  1 mpg       643.   7.38  10.4 
##  2 cyl       198    4      4   
##  3 disp     7383. 205.    71.1 
##  4 hp       4694   83.5   52   
##  5 drat      115.   0.840  2.76
##  6 wt        103.   1.03   1.51
##  7 qsec      571.   2.01  14.5 
##  8 vs         14    1      0   
##  9 am         13    1      0   
## 10 gear      118    1      3   
## 11 carb       90    2      1


Translate dplyr to SQL

Even better, you can choose the dialect of SQL!

library(dplyr)
library(dbplyr)

df <- tibble(
  y = c('a', 'b', 'c'), 
  z = c(2, 3, 4)
  )

x <- tbl_lazy(
  df, 
  con = simulate_mssql()
  )

x %>% 
  filter(y != 'a') %>% 
  summarise(x = sd(z, na.rm = TRUE)) %>% 
  show_query()
## <SQL>
## SELECT STDEV(`z`) AS `x`
## FROM `df`
## WHERE (`y` != 'a')


Miscellanea

  • Use recode and recode_factor