Who bats best?

Cricket commentators often bang on about changes in batting quality through the ages. Or they say that batting order matters to averages … or vice-versa. But is there anything in these questions? In this post, I try and find out a bit.

The data

A couple of years ago, I downloaded the top 200 averages for each batting order from the wonderful stats engine at espnCricinfo. I then excluded averages from players with fewer than twenty innings. I also only considered results against Australia, England, India, New Zealand, Pakistan, South Africa, Sri Lanka and the West Indies.

Having done so, I now tidy this data below and show its first six rows, for reference.

suppressMessages(library(tidyverse))
library(gganimate)
library(ggiraph)
library(glue)
library(htmlwidgets)
theme_set(theme_bw())
  
batOrder <- read_csv("data/BattingOrder.csv") %>% 
  mutate(
    Name = word(Player, start = 1L, end = -2L),
    fullCountry = word(Player, -1),
    Country = str_sub(fullCountry, 2,-2)) %>% 
  filter(
    Country %in% c("Aus", "Ban", "Eng", "India", "NZ", "Pak", "SA", "SL", "WI")
    ) %>% 
  mutate(
    Start = as.integer(str_sub(Span, 1, 4)),
    Decade = 10*trunc(Start/10),
    Name = str_replace_all(Name, "'", " ") 
    ) %>% 
  select(Name, Country, Start, Decade, Ave, Innings = Inns, Runs, Bat)

head(batOrder)
## # A tibble: 6 x 8
##   Name        Country Start Decade   Ave Innings  Runs   Bat
##   <chr>       <chr>   <int>  <dbl> <dbl>   <dbl> <dbl> <dbl>
## 1 B Mitchell  SA       1931   1930  65.0      26  1431     1
## 2 JB Hobbs    Eng      1908   1900  58.1      87  4768     1
## 3 RB Simpson  Aus      1961   1960  58.0      28  1567     1
## 4 L Hutton    Eng      1938   1930  57.7     120  6236     1
## 5 V Sehwag    India    2002   2000  55.2      21  1159     1
## 6 H Sutcliffe Eng      1924   1920  54.4      20   979     1


Charts

I then create the animation, using the splendid gganimate.

batOrder %>% 
  ggplot(
    aes(
      x = Decade,
      y = Ave,
      color = Country, 
      size = Innings
      )
    ) +
  geom_point(alpha = 1) +
  labs(
    x = "Last decade of the batsman's career",
    y = ""
    ) +
  ggtitle(
    'The 200 best players who have ever batted at {closest_state} in the order',
    subtitle = 'Average when batting at that position'
    ) + 
  transition_states(
    states = Bat,
    transition_length = 2,
    state_length = 1
    ) + 
  ease_aes('cubic-in-out') 



Finally, I can show this data statically and enable pop-ups to access the names of the batsmen in question.

p1 <- batOrder %>% 
  ggplot(
    aes(
      x = Decade, 
      y = Ave
      )
    ) +
  geom_point_interactive(
    aes(
      tooltip = Name,
      color = Country,
      size = Innings
      )
    ) + labs(
    x = "\n Last decade of the batsman's career",
    y = ""
    ) + 
  geom_smooth(method = 'auto') +
  ggtitle(
    'Best players batting at this position',
    subtitle = 'Average when batting at this position'
    ) + 
  facet_wrap(~ Bat, ncol = 3) +
  theme(
    strip.text = element_text(
      size = 6, 
      face = "bold"
      )
    )

girafe(
  code = print(p1), 
  width_svg = 9, 
  height_svg = 5
  )


In the chart above, note that:

  • There are actually many charts, each representing a different batting order (from 1 to 11)

  • Each player represents a different dot in a chart. Some players appear in more than one chart, as they batted a lot in different positions in the order

  • The colour of a dot shows the country that the player represented

  • The vertical position of a dot shows the player’s average when at that spot in the batting order

  • The horizontal position of a dot shows the decade in which the player’s career began


Results

Taken together, the charts tell a story. After all, it doesn’t seem like averages have changed that much over time (although the number of lower-order players with more than twenty innings has increased over the decades).

That said, batting order matters. Whether that’s a self-fulfilling prophecy or not is harder to measure, but it matters – particularly when you get to seventh or eighth.