Who bats best?

Cricket commentators often bang on about changes in batting quality through the ages. Or they say that batting order matters to averages … or vice-versa. But is there anything in these questions? In this post, I try and find out a bit.

Robin Penfold
2016-10-09

Let’s start with the data.

Data

A couple of years ago, I downloaded the top 200 averages for each batting order from the wonderful stats engine at espnCricinfo. I then excluded averages from players with fewer than twenty innings. I also only considered results against Australia, England, India, New Zealand, Pakistan, South Africa, Sri Lanka and the West Indies.

Having done so, I now tidy this data below and show its first six rows, for reference.

Name Country Start Decade Ave Innings Runs Bat
B Mitchell SA 1931 1930 65.04 26 1431 1
JB Hobbs Eng 1908 1900 58.14 87 4768 1
RB Simpson Aus 1961 1960 58.03 28 1567 1
L Hutton Eng 1938 1930 57.74 120 6236 1
V Sehwag India 2002 2000 55.19 21 1159 1
H Sutcliffe Eng 1924 1920 54.38 20 979 1


Charts

I then create the animation, using the splendid gganimate.



Finally, I can show this data statically and enable pop-ups to access the names of the batsmen in question.


In the chart above, note that:


Results

Taken together, the charts tell a story. After all, it doesn’t seem like averages have changed that much over time (although the number of lower-order players with more than twenty innings has increased over the decades).

That said, batting order matters. Whether that’s a self-fulfilling prophecy or not is harder to measure, but it matters – particularly when you get to seventh or eighth.

For completeness (and reproducibility), here’s the code that I used to calculate what’s above.

Data


suppressMessages(library(tidyverse))
library(gganimate)
library(ggiraph)
library(glue)
library(htmlwidgets)
theme_set(theme_bw())
  
batOrder <- read_csv("BattingOrder.csv") %>% 
  mutate(
    Name = word(Player, start = 1L, end = -2L),
    fullCountry = word(Player, -1),
    Country = str_sub(fullCountry, 2,-2)) %>% 
  filter(
    Country %in% c("Aus", "Ban", "Eng", "India", "NZ", "Pak", "SA", "SL", "WI")
    ) %>% 
  mutate(
    Start = as.integer(str_sub(Span, 1, 4)),
    Decade = 10*trunc(Start/10),
    Name = str_replace_all(Name, "'", " ") 
    ) %>% 
  select(Name, Country, Start, Decade, Ave, Innings = Inns, Runs, Bat)

knitr::kable(head(batOrder))

Charts


batOrder %>% 
  ggplot(
    aes(
      x = Decade,
      y = Ave,
      color = Country, 
      size = Innings
      )
    ) +
  geom_point(alpha = 1) +
  labs(
    x = "Last decade of the batsman's career",
    y = ""
    ) +
  ggtitle(
    'The 200 best players who have ever batted at {closest_state} in the order',
    subtitle = 'Average when batting at that position'
    ) + 
  transition_states(
    states = Bat,
    transition_length = 2,
    state_length = 1
    ) + 
  ease_aes('cubic-in-out') 

p1 <- batOrder %>% 
  ggplot(
    aes(
      x = Decade, 
      y = Ave
      )
    ) +
  geom_point_interactive(
    aes(
      tooltip = Name,
      color = Country,
      size = Innings
      )
    ) + labs(
    x = "\n Last decade of the batsman's career",
    y = ""
    ) + 
  geom_smooth(method = 'auto') +
  ggtitle(
    'Best players batting at this position',
    subtitle = 'Average when batting at this position'
    ) + 
  facet_wrap(~ Bat, ncol = 3) +
  theme(
    strip.text = element_text(
      size = 6, 
      face = "bold"
      )
    )

girafe(
  code = print(p1), 
  width_svg = 9, 
  height_svg = 5
  )

System settings


R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.15

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
 [1] gdtools_0.1.8     htmlwidgets_1.5.1 glue_1.3.1       
 [4] ggiraph_0.6.1     gganimate_1.0.3   forcats_0.4.0    
 [7] stringr_1.4.0     dplyr_0.8.3       purrr_0.3.2      
[10] readr_1.3.1       tidyr_1.0.0       tibble_2.1.3     
[13] ggplot2_3.2.1     tidyverse_1.2.1  

loaded via a namespace (and not attached):
 [1] progress_1.2.2    tidyselect_0.2.5  xfun_0.9         
 [4] haven_2.1.1       lattice_0.20-38   colorspace_1.4-1 
 [7] vctrs_0.2.0       generics_0.0.2    htmltools_0.4.0  
[10] yaml_2.2.0        rlang_0.4.0       pillar_1.4.2     
[13] withr_2.1.2       tweenr_1.0.1      modelr_0.1.5     
[16] readxl_1.3.1      lifecycle_0.1.0   munsell_0.5.0    
[19] gtable_0.3.0      cellranger_1.1.0  rvest_0.3.4      
[22] evaluate_0.14     labeling_0.3      knitr_1.25       
[25] highr_0.8         broom_0.5.2       Rcpp_1.0.2       
[28] scales_1.0.0      backports_1.1.4   jsonlite_1.6     
[31] farver_1.1.0      distill_0.7       hms_0.5.0        
[34] digest_0.6.20     stringi_1.4.3     grid_3.6.0       
[37] cli_1.1.0         tools_3.6.0       magrittr_1.5     
[40] lazyeval_0.2.2    crayon_1.3.4      pkgconfig_2.0.2  
[43] zeallot_0.1.0     xml2_1.2.2        prettyunits_1.0.2
[46] lubridate_1.7.4   assertthat_0.2.1  rmarkdown_1.16   
[49] httr_1.4.1        rstudioapi_0.10   R6_2.4.0         
[52] nlme_3.1-139      compiler_3.6.0