I keep a list of frequently-used code snippets that I find valuable and, in the spirit of the R community, I thought it best to share them.
In no particular order, here they are …
I use the following code to start an R notebook.
---
title: ""
author: "Robin Penfold"
date: ""
output:
html_notebook:
theme: sandstone
highlight: pygments
code_folding: "show"
toc: true
toc_float: true
toc_depth: 4
---
***
### Introduction
Add ...
I also reduce the ouput by adding the following to the first code chunk (and setting message
to FALSE
):
suppressMessages(library(tidyverse))
Simple but frequently used.
mtcars %>%
count(cyl, gear)
# A tibble: 8 x 3
cyl gear n
<dbl> <dbl> <int>
1 4 3 1
2 4 4 8
3 4 5 2
4 6 3 2
5 6 4 4
6 6 5 1
7 8 3 12
8 8 5 2
mtcars %>%
count(cyl, gear) %>%
count(gear)
# A tibble: 3 x 2
gear n
<dbl> <int>
1 3 3
2 4 2
3 5 3
iris %>%
mutate_at(
.cols = vars(-Species),
.funs = funs(. * 100)
) %>%
head()
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 510 350 140 20 setosa
2 490 300 140 20 setosa
3 470 320 130 20 setosa
4 460 310 150 20 setosa
5 500 360 140 20 setosa
6 540 390 170 40 setosa
As with the previous shortcut, I often need to mutate all but a few columns. Running the code directly above multiplies all columns in iris
by 100 except for the Species
. As with many of these snippets, the example shows the principle, which can be applied in many ways.
Taken from Advanced R by Hadley Wickham, this infix function provides a default value in the case where the output of another function is NULL
.
library(rlang)
# `%||%` <- function(a, b) if (!is.null(a)) a else b
function_that_might_return_NULL() %||% default_value
I tend to use the format read_html(index_page) %>% html_nodes("a") %>% html_attr("href")
.
library(rvest)
library(xml2)
read_html("http://www.theguardian.com/football/premierleague/table") %>%
html_nodes(".table--striped") %>%
.[[1]] %>%
html_table() %>%
select(Team:Pts) %>%
head()
Team GP W D L F A GD Pts
1 Liverpool 8 8 0 0 20 6 14 24
2 Man City 8 5 1 2 27 9 18 16
3 Arsenal 8 4 3 1 13 11 2 15
4 Leicester 8 4 2 2 14 7 7 14
5 Chelsea 8 4 2 2 18 14 4 14
6 C Palace 8 4 2 2 8 8 0 14
I often need to select all but a few columns that end in a certain way.
iris %>%
select(
everything(),
-ends_with("Width")
) %>%
head()
Sepal.Length Petal.Length Species
1 5.1 1.4 setosa
2 4.9 1.4 setosa
3 4.7 1.3 setosa
4 4.6 1.5 setosa
5 5.0 1.4 setosa
6 5.4 1.7 setosa
This code chunk metadata enables a ggplot2 object to appear nicely in Rmarkdown.
{r, eval=TRUE, echo=TRUE, warning=FALSE, message=FALSE, fig.show='hold', fig.width = 6, fig.asp=0.618, out.width="70%", fig.align="center"}
Do you ever need to summarise all columns in a dataframe using certain functions. I find this shortcut really helpful for doing so in a tidy way.
mtcars %>%
map_df(
.x = .,
.f = ~tibble(
sum = sum(.x),
iqr = IQR(.x),
min = min(.x)
),
.id = "var_name"
)
# A tibble: 11 x 4
var_name sum iqr min
<chr> <dbl> <dbl> <dbl>
1 mpg 643. 7.38 10.4
2 cyl 198 4 4
3 disp 7383. 205. 71.1
4 hp 4694 83.5 52
5 drat 115. 0.840 2.76
6 wt 103. 1.03 1.51
7 qsec 571. 2.01 14.5
8 vs 14 1 0
9 am 13 1 0
10 gear 118 1 3
11 carb 90 2 1
Even better, you can choose the dialect of SQL!
library(dplyr)
library(dbplyr)
df <- tibble(
y = c('a', 'b', 'c'),
z = c(2, 3, 4)
)
x <- tbl_lazy(
df,
con = simulate_mssql()
)
x %>%
filter(y != 'a') %>%
summarise(x = sd(z, na.rm = TRUE)) %>%
show_query()
<SQL>
SELECT STDEV(`z`) AS `x`
FROM `df`
WHERE (`y` != 'a')
Use recode
and recode_factor
To replace NAs in the variable **style* with Other: replace_na(list(style = “Other”))
complete(date, locality, fill = list(collisions = 0))
— For blank entries for collisions in each locality and date, set collisions equal to zero
coord_flip()
to swap axes
readr::parse_number()
extracts only the numerical elements of an object
extract(title, “year”, “([12]//d//d//d)”, convert = TRUE, remove = FALSE)
adds a numeric year column if there’s a four digit year in the title string that begins with a 1 or a 2