Tracking Covid-19

Although Covid-19 is all over the news, I can’t find any analysis that uses per-capita data on a log scale. For that reason, I’ve done this analysis myself.

Robin Penfold
2020-03-21

First, I’ll start with a simple log plot over time.



Whilst this chart has a log scale, there is no sense of the country’s population.

At some level, of course, cases are cases: they sadly represent people suffering. In other ways, however, a thousand cases in China (with a population of over a billion) suggests a proportionately smaller problem than a thousand cases in Norway (with a population of just over 5 million).

Furthermore, as Justin Wolfers rightly says below, both types of log analysis will show the same growth rates.

The vertical axis shows total cases on a log scale, so it's like graphing log(cases) on a regular scale.

A per capita graph shows log(cases/population) which = log(cases)-log(population).

Total population is barely moving day to day, so this is like log(cases)-constant.

— Justin Wolfers (@JustinWolfers) March 20, 2020


In the following chart, I show the cases per million of population, but only in instances where there have already been many cases. (That way, the constant term noted above becomes less relevant.)

To my mind, once the virus becomes (sadly) more widespread, this version of the analysis becomes ever more useful, particularly when comparing the effectiveness of action across different countries.



The pessimist in me sees the chart above as sobering. Essentially, most of these countries seem to experience similar rates of per-capita infection. Perhaps this is just for those countries with the most per-capita cases. Or maybe this is a very unfortunate feature of the outbreak. Anyhow, I will leave the reasoning to domain experts (rather than curious data types).

Either way, I have a suspicion that population-weighted density (rather than population density) will become a familiar term soon. The chart below shows more, and seems worrying for Spain, with further details available in the paper that is referenced in the Appendix.


For completeness (and reproducibility), here’s the code that I used to calculate what’s above.

Data and simple log chart


# Initialise ----
library(countrycode)
library(glue)
library(lubridate)
library(rtweet)
library(scales)
library(tidyverse)
library(wbstats)

country_num <- 12
pop_min <- 10000000
cases_PM_init <- 10

# Get data ----
data <- read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv') %>% 
  select(-Lat, -Long, -`Province/State`) %>% 
  group_by(`Country/Region`) %>% 
  summarise_all(.funs = sum) %>% 
  gather(
    key = "Date", 
    value = "Cases", 
    -`Country/Region`
  ) %>% 
  rename("Country" = `Country/Region`) %>% 
  mutate(
    Date = as.Date(Date, "%m/%d/%y"),
    DateNumber = as.integer(
      Date - as.Date("01/21/20", "%m/%d/%y")
    ),
    Country = str_to_upper(Country)
    )

# countries <- countrycode::codelist
# write_rds(x = countries, path = 'countries.rds')
countries <- read_rds('countries.rds')

# population <- wbstats::wb(country = "all", indicator = "SP.POP.TOTL") %>% 
#   group_by(iso3c) %>% 
#   arrange(desc(date)) %>% 
#   slice(1) %>% 
#   ungroup() %>% 
#   distinct()
# write_rds(x = population, path = 'population.rds')
population <- read_rds('population.rds')


# Tidy data ----
data_tidy <- data %>% 
  left_join(
    countries, 
    by = c('Country' = 'genc.name')
  ) %>% 
  left_join(
    population, 
    by = 'iso3c'
  ) %>% 
  select("Country" = country.name.en, Date, Cases, DateNumber, continent, iso3c, value) %>% 
  filter(
    !is.na(Country),
    value > pop_min
  ) %>% 
  mutate(
    Cases_PM = 1000000 * Cases / value,
    Country = as_factor(Country),
    Country = fct_reorder(
      .f = Country, 
      .x = Cases, 
      .fun = max, 
      .desc = TRUE
      ),
    Country_PM = fct_reorder(
      .f = Country, 
      .x = Cases_PM, 
      .fun = max, 
      .desc = TRUE
      )
    )

data_now <- data_tidy %>%
  filter(
    Date == max(Date)
    ) %>%
  select(-Date) %>%
  arrange(desc(Cases)) %>%
  slice(1:country_num)

data_now_PM <- data_tidy %>%
  filter(
    Date == max(Date),
    !is.na(Country_PM)
  ) %>%
  select(-Date) %>%
  arrange(desc(Cases_PM)) %>%
  slice(1:country_num)


# Rebase per capita cases ----
data_base_value <- data_tidy %>% 
  arrange(Date) %>% 
  group_by(Country) %>% 
  filter(Cases_PM > cases_PM_init) %>% 
  slice(1) %>% 
  ungroup() %>% 
  select(Country, "Date_base" = Date, "DateNumber_base" = DateNumber, "Cases_PM_base" = Cases_PM)
  
data_rebased <- data_tidy %>% 
  left_join(
    data_base_value, 
    by = 'Country'
    ) %>% 
  mutate(
    Cases_PM_rebased = Cases_PM / Cases_PM_base,
    DateNumber_rebased = DateNumber - DateNumber_base
    ) %>% 
  filter(DateNumber_rebased >= 0)

data_tidy %>% 
  filter(Country %in% data_now$Country) %>%
  ggplot(
    aes(
      x = Date,
      y = Cases,
      group = Country,
      colour = Country
    )
  ) + 
  scale_y_continuous(
    trans = "log", 
    labels = label_number(
      big.mark = ','
      )
    ) + 
  geom_line() + 
  theme_minimal() + 
  theme(plot.title.position = "plot") +
  labs(
    x = "",
    y = "",
    colour = "",
    title = "Daily cumulative Covid-19 infections",
    subtitle = "For the twelve countries (of over 10m people) with the most cases \n \n"
  )

Rebased chart


data_rebased %>% 
  filter(Country %in% data_now_PM$Country_PM) %>% 
  mutate(
    Country_PMR = fct_reorder(
      .f = Country, 
      .x = Cases_PM_rebased, 
      .fun = max, 
      .desc = TRUE
      )
  ) %>% 
  ggplot(
    aes(
      x = DateNumber_rebased,
      y = Cases_PM_rebased,
      group = Country_PMR,
      colour = Country_PMR
    )
  ) + 
  scale_y_continuous(
    trans = "log", 
    labels = label_number(
      big.mark = ','
    )
  ) + 
  geom_line() + 
  theme_minimal() +
  theme(plot.title.position = "plot") +
  labs(
    x = "",
    y = "",
    colour = "",
    title = "Cumulative Covid-19 infections per million inhabitants",
    subtitle = "For the twelve countries (of over 10m people) with the most per-capita cases \nShown against days since cumulative infections first exceeded 10 per million \n \n"
  )

References

Kompil M, Aurambout J, Ribeiro Barranco R, Barbosa A, Jacobs-Crisioni C, Pisoni E, Zulian G, Vandecasteele I, Trombetti M, Vizcaino M, Vallecillo Rodriguez S, Batista e Silva F, Baranzelli C, Mari Rivero I, Perpiña Castillo C, Polce C, Maes J, Lavalle C., 2015, European cities: territorial analysis of characteristics and trends - An application of the LUISA Modelling Platform (EU Reference Scenario 2013 - Updated Configuration 2014), EUR 27709 EN, doi:10.2788/737963

System settings


R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.15.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
 [1] wbstats_0.2        forcats_0.5.0      stringr_1.4.0     
 [4] dplyr_0.8.5        purrr_0.3.3        readr_1.3.1       
 [7] tidyr_1.0.2        tibble_2.1.3       ggplot2_3.3.0.9000
[10] tidyverse_1.3.0    scales_1.1.0       rtweet_0.7.0      
[13] lubridate_1.7.4    glue_1.3.2         countrycode_1.1.1 

loaded via a namespace (and not attached):
 [1] tidyselect_1.0.0     xfun_0.12            haven_2.2.0         
 [4] lattice_0.20-38      colorspace_1.4-1     vctrs_0.2.4         
 [7] generics_0.0.2       htmltools_0.4.0.9003 yaml_2.2.1          
[10] rlang_0.4.5          pillar_1.4.3         withr_2.1.2         
[13] DBI_1.1.0            dbplyr_1.4.2         modelr_0.1.5        
[16] readxl_1.3.1         lifecycle_0.2.0      munsell_0.5.0       
[19] gtable_0.3.0         cellranger_1.1.0     rvest_0.3.5         
[22] evaluate_0.14        labeling_0.3         knitr_1.28          
[25] curl_4.3             fansi_0.4.1          broom_0.5.4         
[28] Rcpp_1.0.4           backports_1.1.5      jsonlite_1.6.1      
[31] farver_2.0.3         fs_1.3.2             distill_0.7         
[34] hms_0.5.3            digest_0.6.25        stringi_1.4.6       
[37] grid_3.6.0           cli_2.0.2            tools_3.6.0         
[40] magrittr_1.5         crayon_1.3.4         pkgconfig_2.0.3     
[43] ellipsis_0.3.0       xml2_1.2.5           reprex_0.3.0        
[46] assertthat_0.2.1     rmarkdown_2.1        httr_1.4.1          
[49] rstudioapi_0.11      R6_2.4.1             nlme_3.1-139        
[52] compiler_3.6.0