Analysis of IPL T20 matches with yorkr templates


Introduction

In this post I create RMarkdown templates for end-to-end analysis of IPL T20 matches, that are available on Cricsheet based on my R package yorkr.  With these templates you can convert all IPL data which is in yaml format to R dataframes. Further I create data and the necessary templates for analyzing IPL matches, teams and players. All of these can be accessed at yorkrIPLTemplate.

Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today!

The templates are

  1. Template for conversion and setup – IPLT20Template.Rmd
  2. Any IPL match – IPLMatchtemplate.Rmd
  3. IPL matches between 2 nations – IPLMatches2TeamTemplate.Rmd
  4. A IPL nations performance against all other IPL nations – IPLAllMatchesAllOppnTemplate.Rmd
  5. Analysis of IPL batsmen and bowlers of all IPL nations – IPLBatsmanBowlerTemplate.Rmd

Besides the templates the repository also includes the converted data for all IPL matches I downloaded from Cricsheet in Dec 2016. So this data is complete till the 2016 IPL season. You can recreate the files as more matches are added to Cricsheet site in IPL 2017 and future seasons. This post contains all the steps needed for detailed analysis of IPL matches, teams and IPL player. This will also be my reference in future if I decide to analyze IPL in future!

There will be 5 folders at the root

  1. IPLdata – Match files as yaml from Cricsheet
  2. IPLMatches – Yaml match files converted to dataframes
  3. IPLMatchesBetween2Teams – All Matches between any 2 IPL teams
  4. allMatchesAllOpposition – An IPL teams’s performance against all other teams
  5. BattingBowlingDetails – Batting and bowling details of all IPL teams
library(yorkr)
library(dplyr)

The first few steps take care of the data setup. This needs to be done before any of the analysis of IPL batsmen, bowlers, any IPL match, matches between any 2 IPL countries or analysis of a teams performance against all other countries

There will be 5 folders at the root

  1. data
  2. IPLMatches
  3. IPLMatchesBetween2Teams
  4. allMatchesAllOpposition
  5. BattingBowlingDetails

The source YAML files will be in IPLData folder

1.Create directory of IPLMatches

Some files may give conversions errors. You could try to debug the problem or just remove it from the IPLdata folder. At most 2-4 file will have conversion problems and I usally remove then from the files to be converted.

Also take a look at my GooglyPlus shiny app which was created after performing the same conversion on the Dec 16 data .

convertAllYaml2RDataframesT20("data","IPLMatches")

2.Save all matches between all combinations of IPL nations

This function will create the set of all matches between each IPL team against every other IPL team. This uses the data that was created in IPLMatches, with the convertAllYaml2RDataframesIPL() function.

setwd("./IPLMatchesBetween2Teams")
saveAllMatchesBetween2IPLTeams("../IPLMatches")

3.Save all matches against all opposition

This will create a consolidated dataframe of all matches played by every IPL playing nation against all other nattions. This also uses the data that was created in IPLMatches, with the convertAllYaml2RDataframesIPL() function.

setwd("../allMatchesAllOpposition")
saveAllMatchesAllOppositionIPLT20("../IPLMatches")

4. Create batting and bowling details for each IPL team

These are the current IPL playing teams. You can add to this vector as newer IPL teams start playing IPL. You will get to know all IPL teams by also look at the directory created above namely allMatchesAllOpposition. This also uses the data that was created in IPLMatches, with the convertAllYaml2RDataframesIPL() function.

setwd("../BattingBowlingDetails")
ipl_teams <- list("Chennai Super Kings","Deccan Chargers", "Delhi Daredevils","Kings XI Punjab", 
              "Kochi Tuskers Kerala","Kolkata Knight Riders","Mumbai Indians","Pune Warriors",
              "Rajasthan Royals","Royal Challengers Bangalore","Sunrisers Hyderabad","Gujarat Lions",
                 "Rising Pune Supergiants")

for(i in seq_along(ipl_teams)){
    print(ipl_teams[i])
    val <- paste(ipl_teams[i],"-details",sep="")
    val <- getTeamBattingDetails(ipl_teams[i],dir="../IPLMatches", save=TRUE)

}

for(i in seq_along(ipl_teams)){
    print(ipl_teams[i])
    val <- paste(ipl_teams[i],"-details",sep="")
    val <- getTeamBowlingDetails(ipl_teams[i],dir="../IPLMatches", save=TRUE)

}

5. Get the list of batsmen for a particular IPL team

The following code is needed for analyzing individual IPL batsmen. In IPL a player could have played in multiple IPL teams.

getBatsmen <- function(df){
    bmen <- df %>% distinct(batsman) 
    bmen <- as.character(bmen$batsman)
    batsmen <- sort(bmen)
}
load("Chennai Super Kings-BattingDetails.RData")
csk_details <- battingDetails
load("Deccan Chargers-BattingDetails.RData")
dc_details <- battingDetails
load("Delhi Daredevils-BattingDetails.RData")
dd_details <- battingDetails
load("Kings XI Punjab-BattingDetails.RData")
kxip_details <- battingDetails
load("Kochi Tuskers Kerala-BattingDetails.RData")
ktk_details <- battingDetails
load("Kolkata Knight Riders-BattingDetails.RData")
kkr_details <- battingDetails
load("Mumbai Indians-BattingDetails.RData")
mi_details <- battingDetails
load("Pune Warriors-BattingDetails.RData")
pw_details <- battingDetails
load("Rajasthan Royals-BattingDetails.RData")
rr_details <- battingDetails
load("Royal Challengers Bangalore-BattingDetails.RData")
rcb_details <- battingDetails
load("Sunrisers Hyderabad-BattingDetails.RData")
sh_details <- battingDetails
load("Gujarat Lions-BattingDetails.RData")
gl_details <- battingDetails
load("Rising Pune Supergiants-BattingDetails.RData")
rps_details <- battingDetails

#Get the batsmen for each IPL team
csk_batsmen <- getBatsmen(csk_details)
dc_batsmen <- getBatsmen(dc_details)
dd_batsmen <- getBatsmen(dd_details)
kxip_batsmen <- getBatsmen(kxip_details)
ktk_batsmen <- getBatsmen(ktk_details)
kkr_batsmen <- getBatsmen(kkr_details)
mi_batsmen <- getBatsmen(mi_details)
pw_batsmen <- getBatsmen(pw_details)
rr_batsmen <- getBatsmen(rr_details)
rcb_batsmen <- getBatsmen(rcb_details)
sh_batsmen <- getBatsmen(sh_details)
gl_batsmen <- getBatsmen(gl_details)
rps_batsmen <- getBatsmen(rps_details)

# Save the dataframes
save(csk_batsmen,file="csk.RData")
save(dc_batsmen, file="dc.RData")
save(dd_batsmen, file="dd.RData")
save(kxip_batsmen, file="kxip.RData")
save(ktk_batsmen, file="ktk.RData")
save(kkr_batsmen, file="kkr.RData")
save(mi_batsmen , file="mi.RData")
save(pw_batsmen, file="pw.RData")
save(rr_batsmen, file="rr.RData")
save(rcb_batsmen, file="rcb.RData")
save(sh_batsmen, file="sh.RData")
save(gl_batsmen, file="gl.RData")
save(rps_batsmen, file="rps.RData")

6. Get the list of bowlers for a particular IPL team

The method below can get the list of bowler names for any IPL team.The following code is needed for analyzing individual IPL bowlers. In IPL a player could have played in multiple IPL teams.

getBowlers <- function(df){
    bwlr <- df %>% distinct(bowler) 
    bwlr <- as.character(bwlr$bowler)
    bowler <- sort(bwlr)
}

load("Chennai Super Kings-BowlingDetails.RData")
csk_details <- bowlingDetails
load("Deccan Chargers-BowlingDetails.RData")
dc_details <- bowlingDetails
load("Delhi Daredevils-BowlingDetails.RData")
dd_details <- bowlingDetails
load("Kings XI Punjab-BowlingDetails.RData")
kxip_details <- bowlingDetails
load("Kochi Tuskers Kerala-BowlingDetails.RData")
ktk_details <- bowlingDetails
load("Kolkata Knight Riders-BowlingDetails.RData")
kkr_details <- bowlingDetails
load("Mumbai Indians-BowlingDetails.RData")
mi_details <- bowlingDetails
load("Pune Warriors-BowlingDetails.RData")
pw_details <- bowlingDetails
load("Rajasthan Royals-BowlingDetails.RData")
rr_details <- bowlingDetails
load("Royal Challengers Bangalore-BowlingDetails.RData")
rcb_details <- bowlingDetails
load("Sunrisers Hyderabad-BowlingDetails.RData")
sh_details <- bowlingDetails
load("Gujarat Lions-BowlingDetails.RData")
gl_details <- bowlingDetails
load("Rising Pune Supergiants-BowlingDetails.RData")
rps_details <- bowlingDetails

# Get the bowlers for each team
csk_bowlers <- getBowlers(csk_details)
dc_bowlers <- getBowlers(dc_details)
dd_bowlers <- getBowlers(dd_details)
kxip_bowlers <- getBowlers(kxip_details)
ktk_bowlers <- getBowlers(ktk_details)
kkr_bowlers <- getBowlers(kkr_details)
mi_bowlers <- getBowlers(mi_details)
pw_bowlers <- getBowlers(pw_details)
rr_bowlers <- getBowlers(rr_details)
rcb_bowlers <- getBowlers(rcb_details)
sh_bowlers <- getBowlers(sh_details)
gl_bowlers <- getBowlers(gl_details)
rps_bowlers <- getBowlers(rps_details)

#Save the dataframes
save(csk_bowlers,file="csk1.RData")
save(dc_bowlers, file="dc1.RData")
save(dd_bowlers, file="dd1.RData")
save(kxip_bowlers, file="kxip1.RData")
save(ktk_bowlers, file="ktk1.RData")
save(kkr_bowlers, file="kkr1.RData")
save(mi_bowlers , file="mi1.RData")
save(pw_bowlers, file="pw1.RData")
save(rr_bowlers, file="rr1.RData")
save(rcb_bowlers, file="rcb1.RData")
save(sh_bowlers, file="sh1.RData")
save(gl_bowlers, file="gl1.RData")
save(rps_bowlers, file="rps1.RData")

Now we are all set

A)  IPL T20 Match Analysis

1 IPL Match Analysis

Load any match data from the ./IPLMatches folder for e.g. Chennai Super Kings-Deccan Chargers-2008-05-06.RData

setwd("./IPLMatches")
load("Chennai Super Kings-Deccan Chargers-2008-05-06.RData")
csk_dc<- overs
#The steps are
load("IPLTeam1-IPLTeam2-Date.Rdata")
IPLTeam1_IPLTeam2 <- overs

All analysis for this match can be done now

2. Scorecard

teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1")
teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2")

3.Batting Partnerships

teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam2","IPLTeam1")

4. Batsmen vs Bowler Plot

teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=TRUE)
teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)

5. Team bowling scorecard

teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1")
teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2")

6. Team bowling Wicket kind match

teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <-teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

7. Team Bowling Wicket Runs Match

teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <-teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

8. Team Bowling Wicket Match

m <-teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m
teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")

9. Team Bowler vs Batsmen

teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <- teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

10. Match Worm chart

matchWormGraph(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")

B)  IPL  Matches between 2  IPL teams

1 IPL Match Analysis

Load any match data from the ./IPLMatches folder for e.g. Chennai Super Kings-Deccan Chargers-2008-05-06.RData

setwd("./IPLMatches")
load("Chennai Super Kings-Deccan Chargers-2008-05-06.RData")
csk_dc<- overs
#The steps are
load("IPLTeam1-IPLTeam2-Date.Rdata")
IPLTeam1_IPLTeam2 <- overs

All analysis for this match can be done now

2. Scorecard

teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1")
teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2")

3.Batting Partnerships

teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam2","IPLTeam1")

4. Batsmen vs Bowler Plot

teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=TRUE)
teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)

5. Team bowling scorecard

teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1")
teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2")

6. Team bowling Wicket kind match

teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <-teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

7. Team Bowling Wicket Runs Match

teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <-teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

8. Team Bowling Wicket Match

m <-teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m
teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")

9. Team Bowler vs Batsmen

teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <- teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

10. Match Worm chart

matchWormGraph(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")

C)  IPL Matches for a team against all other teams

1. IPL Matches for a team against all other teams

Load the data between for a IPL team against all other countries ./allMatchesAllOpposition for e.g all matches of Kolkata Knight Riders

load("allMatchesAllOpposition-Kolkata Knight Riders.RData")
kkr_matches <- matches
IPLTeam="IPLTeam1"
allMatches <- paste("allMatchesAllOposition-",IPLTeam,".RData",sep="")
load(allMatches)
IPLTeam1AllMatches <- matches

2. Team’s batting scorecard all Matches

m <-teamBattingScorecardAllOppnAllMatches(IPLTeam1AllMatches,theTeam="IPLTeam1")
m

3. Batting scorecard of opposing team

m <-teamBattingScorecardAllOppnAllMatches(matches=IPLTeam1AllMatches,theTeam="IPLTeam2")

4. Team batting partnerships

m <- teamBatsmenPartnershipAllOppnAllMatches(IPLTeam1AllMatches,theTeam="IPLTeam1")
m
m <- teamBatsmenPartnershipAllOppnAllMatches(IPLTeam1AllMatches,theTeam='IPLTeam1',report="detailed")
head(m,30)
m <- teamBatsmenPartnershipAllOppnAllMatches(IPLTeam1AllMatches,theTeam='IPLTeam1',report="summary")
m

5. Team batting partnerships plot

teamBatsmenPartnershipAllOppnAllMatchesPlot(IPLTeam1AllMatches,"IPLTeam1",main="IPLTeam1")
teamBatsmenPartnershipAllOppnAllMatchesPlot(IPLTeam1AllMatches,"IPLTeam1",main="IPLTeam2")

6, Team batsmen vs bowlers report

m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=0)
m
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=1,dispRows=30)
m
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(matches=IPLTeam1AllMatches,theTeam="IPLTeam2",rank=1,dispRows=25)
m

7. Team batsmen vs bowler plot

d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=1,dispRows=50)
d
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=2,dispRows=50)
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

8. Team bowling scorecard

teamBowlingScorecardAllOppnAllMatchesMain(matches=IPLTeam1AllMatches,theTeam="IPLTeam1")
teamBowlingScorecardAllOppnAllMatches(IPLTeam1AllMatches,'IPLTeam2')

9. Team bowler vs batsmen

teamBowlersVsBatsmenAllOppnAllMatchesMain(IPLTeam1AllMatches,theTeam="IPLTeam1",rank=0)
teamBowlersVsBatsmenAllOppnAllMatchesMain(IPLTeam1AllMatches,theTeam="IPLTeam1",rank=2)
teamBowlersVsBatsmenAllOppnAllMatchesRept(matches=IPLTeam1AllMatches,theTeam="IPLTeam1",rank=0)

10. Team Bowler vs bastmen

df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(IPLTeam1AllMatches,theTeam="IPLTeam1",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"IPLTeam1","IPLTeam1")

11. Team bowler wicket kind

teamBowlingWicketKindAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="All")
teamBowlingWicketKindAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="IPLTeam2")

12.

teamBowlingWicketRunsAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="All",plot=TRUE)
teamBowlingWicketRunsAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="IPLTeam2",plot=TRUE)

1 IPL Batsman setup functions

Get the batsman’s details for a batsman

setwd("../BattingBowlingDetails")
# IPL Team names
IPLTeamNames <- list("Chennai Super Kings","Deccan Chargers", "Delhi Daredevils","Kings Xi Punjab", 
                  "Kochi Tuskers Kerala","Kolkata Knight Riders","Mumbai Indians","Pune Warriors",
                  "Rajasthan Royals","Royal Challengers Bangalore","Sunrisers Hyderabad","Gujarat Lions",
                  "Rising Pune Supergiants")           


# Check and get the team indices of IPL teams in which the batsman has played
getTeamIndex <- function(batsman){
    setwd("./BattingBowlingDetails")
    load("csk.RData")
    load("dc.RData")
    load("dd.RData")
    load("kxip.RData")
    load("ktk.RData")
    load("kkr.RData")
    load("mi.RData")
    load("pw.RData")
    load("rr.RData")
    load("rcb.RData")
    load("sh.RData")
    load("gl.RData")
    load("rps.RData")
    setwd("..")
    getwd()
    print(ls())
    teams_batsmen = list(csk_batsmen,dc_batsmen,dd_batsmen,kxip_batsmen,ktk_batsmen,kkr_batsmen,mi_batsmen,
                         pw_batsmen,rr_batsmen,rcb_batsmen,sh_batsmen,gl_batsmen,rps_batsmen)
    b <- NULL
    for (i in 1:length(teams_batsmen)){
        a <- which(teams_batsmen[[i]] == batsman)

        if(length(a) != 0)
            b <- c(b,i)
    }
    b
}

# Get the list of the IPL team names from the indices passed
getTeams <- function(x){

    l <- NULL
    # Get the teams passed in as indexes
    for (i in seq_along(x)){

        l <- c(l, IPLTeamNames[[x[i]]]) 

    }
    l
}

# Create a consolidated data frame with all teams the IPL batsman has played for
getIPLBatsmanDF <- function(teamNames){
    batsmanDF <- NULL
   # Create a consolidated Data frame of batsman for all IPL teams played
    for (i in seq_along(teamNames)){
       df <- getBatsmanDetails(team=teamNames[i],name=IPLBatsman,dir="./BattingBowlingDetails")
       batsmanDF <- rbind(batsmanDF,df) 

    }
    batsmanDF
}

2. Create a consolidated IPL batsman data frame

# Since an IPL batsman coculd have played in multiple teams we need to determine these teams and
# create a consolidated data frame for the analysis
# For example to check MS Dhoni we need to do the following

IPLBatsman = "MS Dhoni"
#Check and get the team indices of IPL teams in which the batsman has played
i <- getTeamIndex(IPLBatsman)

# Get the team names in which the IPL batsman has played
teamNames <- getTeams(i)
    # Check if file exists in the directory. This check is necessary when moving between matchType


############## Create a consolidated IPL batsman dataframe for analysis
batsmanDF <- getIPLBatsmanDF(teamNames)

3. Runs vs deliveries

# For e.g. batsmanName="MS Dhoni""
#batsmanRunsVsDeliveries(batsmanDF, "MS Dhoni")
batsmanRunsVsDeliveries(batsmanDF,"batsmanName")

4. Batsman 4s & 6s

batsman46 <- select(batsmanDF,batsman,ballsPlayed,fours,sixes,runs)
p1 <- batsmanFoursSixes(batsman46,"batsmanName")

5. Batsman dismissals

batsmanDismissals(batsmanDF,"batsmanName")

6. Runs vs Strike rate

batsmanRunsVsStrikeRate(batsmanDF,"batsmanName")

7. Batsman Moving Average

batsmanMovingAverage(batsmanDF,"batsmanName")

8. Batsman cumulative average

batsmanCumulativeAverageRuns(batsmanDF,"batsmanName")

9. Batsman cumulative strike rate

batsmanCumulativeStrikeRate(batsmanDF,"batsmanName")

10. Batsman runs against oppositions

batsmanRunsAgainstOpposition(batsmanDF,"batsmanName")

11. Batsman runs vs venue

batsmanRunsVenue(batsmanDF,"batsmanName")

12. Batsman runs predict

batsmanRunsPredict(batsmanDF,"batsmanName")

13.Bowler set up functions

setwd("../BattingBowlingDetails")
# IPL Team names
IPLTeamNames <- list("Chennai Super Kings","Deccan Chargers", "Delhi Daredevils","Kings Xi Punjab", 
                  "Kochi Tuskers Kerala","Kolkata Knight Riders","Mumbai Indians","Pune Warriors",
                  "Rajasthan Royals","Royal Challengers Bangalore","Sunrisers Hyderabad","Gujarat Lions",
                  "Rising Pune Supergiants")    



# Get the team indices of IPL teams for which the bowler as played
getTeamIndex_bowler <- function(bowler){
    # Load IPL Bowlers
    setwd("./data")
    load("csk1.RData")
    load("dc1.RData")
    load("dd1.RData")
    load("kxip1.RData")
    load("ktk1.RData")
    load("kkr1.RData")
    load("mi1.RData")
    load("pw1.RData")
    load("rr1.RData")
    load("rcb1.RData")
    load("sh1.RData")
    load("gl1.RData")
    load("rps1.RData")
    setwd("..")
    teams_bowlers = list(csk_bowlers,dc_bowlers,dd_bowlers,kxip_bowlers,ktk_bowlers,kkr_bowlers,mi_bowlers,
                         pw_bowlers,rr_bowlers,rcb_bowlers,sh_bowlers,gl_bowlers,rps_bowlers)
    b <- NULL
    for (i in 1:length(teams_bowlers)){
        a <- which(teams_bowlers[[i]] == bowler)
        if(length(a) != 0){
            b <- c(b,i)
        }
    }
    b
}


# Get the list of the IPL team names from the indices passed
getTeams <- function(x){

    l <- NULL
    # Get the teams passed in as indexes
    for (i in seq_along(x)){

        l <- c(l, IPLTeamNames[[x[i]]]) 

    }
    l
}

# Get the team names
teamNames <- getTeams(i)

getIPLBowlerDF <- function(teamNames){
    bowlerDF <- NULL

    # Create a consolidated Data frame of batsman for all IPL teams played
    for (i in seq_along(teamNames)){
          df <- getBowlerWicketDetails(team=teamNames[i],name=IPLBowler,dir="./BattingBowlingDetails")
          bowlerDF <- rbind(bowlerDF,df) 

    }
    bowlerDF
}

14. Get the consolidated data frame for an IPL bowler

# Since an IPL bowler could have played in multiple teams we need to determine these teams and
# create a consolidated data frame for the analysis
# For example to check R Ashwin we need to do the following

IPLBowler = "R Ashwin"
#Check and get the team indices of IPL teams in which the batsman has played
i <- getTeamIndex(IPLBowler)

# Get the team names in which the IPL batsman has played
teamNames <- getTeams(i)
    # Check if file exists in the directory. This check is necessary when moving between matchType


############## Create a consolidated IPL batsman dataframe for analysis
bowlerDF <- getIPLBowlerDF(teamNames)

15. Bowler Mean Economy rate

# For e.g. to get the details of R Ashwin do
#bowlerMeanEconomyRate(bowlerDF,"R Ashwin")
bowlerMeanEconomyRate(bowlerDF,"bowlerName")

16. Bowler mean runs conceded

bowlerMeanRunsConceded(bowlerDF,"bowlerName")

17. Bowler Moving Average

bowlerMovingAverage(bowlerDF,"bowlerName")

18. Bowler cumulative average wickets

bowlerCumulativeAvgWickets(bowlerDF,"bowlerName")

19. Bowler cumulative Economy Rate (ER)

bowlerCumulativeAvgEconRate(bowlerDF,"bowlerName")

20. Bowler wicket plot

bowlerWicketPlot(bowlerDF,"bowlerName")

21. Bowler wicket against opposition

bowlerWicketsAgainstOpposition(bowlerDF,"bowlerName")

22. Bowler wicket at cricket grounds

bowlerWicketsVenue(bowlerDF,"bowlerName")

23. Predict number of deliveries to wickets

setwd("./IPLMatches")
bowlerDF1 <- getDeliveryWickets(team="IPLTeam1",dir=".",name="bowlerName",save=FALSE)
bowlerWktsPredict(bowlerDF1,"bowlerName")

Beaten by sheer pace – Cricket analytics with yorkr


coverMy ebook “Beaten by sheer pace – Cricket analytics with yorkr’  has been published in Leanpub.  You can now download the book (hot off the press!)  for all formats to your favorite device (mobile, iPad, tablet, Kindle)  from the Leanpub  “Beaten by sheer pace!”. The book has been published in the following formats namely

  • PDF (for your computer)
  • EPUB (for iPad or tablets. Save the file cricketAnalyticsWithYorkr.epub to Google Drive/Dropbox and choose “Open in” iBooks for iPad)
  • MOBI (for Kindle. For this format, I suggest that you download & install SendToKindle for PC/Mac. You can then right click the downloaded cricketAnalyticsWithYorkr.mobi and choose SendToKindle. You will need to login to your Kindle account)

From Leanpub
UntitledLeanpub uses a variable pricing model. I have priced the book attractively (I think!). You can choose a price between FREE (limited time offer!) to $4.99 . The link is “Beaten by sheer pace!

This format works with all type Kindle device, Kindle app, Android tablet, iPad.

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

Introducing cricket package yorkr:Part 4-In the block hole!


Introduction

“The nitrogen in our DNA, the calcium in our teeth, the iron in our blood, the carbon in our apple pies were made in the interiors of collapsing stars. We are made of starstuff.”

“If you wish to make an apple pie from scratch, you must first invent the universe.”

“We are like butterflies who flutter for a day and think it is forever.”

“The absence of evidence is not the evidence of absence.”

“We are star stuff which has taken its destiny into its own hands.”

                              Cosmos - Carl Sagan

This post is the 4th and possibly, the last part of my introduction, to my latest cricket package yorkr. This is the 4th part of the introduction, the 3 earlier ones were

  1. Introducing cricket package yorkr-Part1:Beaten by sheer pace!.
  2. Introducing cricket package yorkr: Part 2-Trapped leg before wicket!
  3. Introducing cricket package yorkr: Part 3-Foxed by flight!

The 1st part included functions dealing with a specific match, the 2nd part dealt with functions between 2 opposing teams. The 3rd part dealt with functions between a team and all matches with all oppositions. This 4th part includes individual batting and bowling performances in ODI matches and deals with Class 4 functions.

This post has also been published at RPubs yorkr-Part4 and can also be downloaded as a PDF document from yorkr-Part4.pdf.

You can clone/fork the code for the package yorkr from Github at yorkr-package

Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today!

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

Batsman functions

  1. batsmanRunsVsDeliveries
  2. batsmanFoursSixes
  3. batsmanDismissals
  4. batsmanRunsVsStrikeRate
  5. batsmanMovingAverage
  6. batsmanCumulativeAverageRuns
  7. batsmanCumulativeStrikeRate
  8. batsmanRunsAgainstOpposition
  9. batsmanRunsVenue
  10. batsmanRunsPredict

Bowler functions

  1. bowlerMeanEconomyRate
  2. bowlerMeanRunsConceded
  3. bowlerMovingAverage
  4. bowlerCumulativeAvgWickets
  5. bowlerCumulativeAvgEconRate
  6. bowlerWicketPlot
  7. bowlerWicketsAgainstOpposition
  8. bowlerWicketsVenue
  9. bowlerWktsPredict

Note: The yorkr package in its current avatar only supports ODI, T20 and IPL T20 matches.

library(yorkr)
library(gridExtra)
library(rpart.plot)
library(dplyr)
library(ggplot2)
rm(list=ls())

A. Batsman functions

1. Get Team Batting details

The function below gets the overall team batting details based on the RData file available in ODI matches. This is currently also available in Github at (https://github.com/tvganesh/yorkrData/tree/master/ODI/ODI-matches).  However you may have to do this as future matches are added! The batting details of the team in each match is created and a huge data frame is created by rbinding the individual dataframes. This can be saved as a RData file

setwd("C:/software/cricket-package/york-test/yorkrData/ODI/ODI-matches")
india_details <- getTeamBattingDetails("India",dir=".", save=TRUE)
dim(india_details)
## [1] 11085    15
sa_details <- getTeamBattingDetails("South Africa",dir=".",save=TRUE)
dim(sa_details)
## [1] 6375   15
nz_details <- getTeamBattingDetails("New Zealand",dir=".",save=TRUE)
dim(nz_details)
## [1] 6262   15
eng_details <- getTeamBattingDetails("England",dir=".",save=TRUE)
dim(eng_details)
## [1] 9001   15

2. Get batsman details

This function is used to get the individual batting record for a the specified batsmen of the country as in the functions below. For analyzing the batting performances the following cricketers have been chosen

  1. Virat Kohli (Ind)
  2. M S Dhoni (Ind)
  3. AB De Villiers (SA)
  4. Q De Kock (SA)
  5. J Root (Eng)
  6. M J Guptill (NZ)
setwd("C:/software/cricket-package/york-test/yorkrData/ODI/ODI-matches")
kohli <- getBatsmanDetails(team="India",name="Kohli",dir=".")
## [1] "./India-BattingDetails.RData"
dhoni <- getBatsmanDetails(team="India",name="Dhoni")
## [1] "./India-BattingDetails.RData"
devilliers <-  getBatsmanDetails(team="South Africa",name="Villiers",dir=".")
## [1] "./South Africa-BattingDetails.RData"
deKock <-  getBatsmanDetails(team="South Africa",name="Kock",dir=".")
## [1] "./South Africa-BattingDetails.RData"
root <-  getBatsmanDetails(team="England",name="Root",dir=".")
## [1] "./England-BattingDetails.RData"
guptill <-  getBatsmanDetails(team="New Zealand",name="Guptill",dir=".")
## [1] "./New Zealand-BattingDetails.RData"

3. Runs versus deliveries

Kohli, De Villiers and Guptill have a good cluster of points that head towards 150 runs at 150 deliveries.

p1 <-batsmanRunsVsDeliveries(kohli,"Kohli")
p2 <- batsmanRunsVsDeliveries(dhoni, "Dhoni")
p3 <- batsmanRunsVsDeliveries(devilliers,"De Villiers")
p4 <- batsmanRunsVsDeliveries(deKock,"Q de Kock")
p5 <- batsmanRunsVsDeliveries(root,"JE Root")
p6 <- batsmanRunsVsDeliveries(guptill,"MJ Guptill")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

runsVsDeliveries-1

4. Batsman Total runs, Fours and Sixes

The plots below show the total runs, fours and sixes by the batsmen

kohli46 <- select(kohli,batsman,ballsPlayed,fours,sixes,runs)
p1 <- batsmanFoursSixes(kohli46,"Kohli")
dhoni46 <- select(dhoni,batsman,ballsPlayed,fours,sixes,runs)
p2 <- batsmanFoursSixes(dhoni46,"Dhoni")
devilliers46 <- select(devilliers,batsman,ballsPlayed,fours,sixes,runs)
p3 <- batsmanFoursSixes(devilliers46, "De Villiers")
deKock46 <- select(deKock,batsman,ballsPlayed,fours,sixes,runs)
p4 <- batsmanFoursSixes(deKock46,"Q de Kock")
root46 <- select(root,batsman,ballsPlayed,fours,sixes,runs)
p5 <- batsmanFoursSixes(root46,"JE Root")
guptill46 <- select(guptill,batsman,ballsPlayed,fours,sixes,runs)
p6 <- batsmanFoursSixes(guptill46,"MJ Guptill")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

foursSixes-1

5. Batsman dismissals

The type of dismissal for each batsman is shown below

p1 <-batsmanDismissals(kohli,"Kohli")
p2 <- batsmanDismissals(dhoni, "Dhoni")
p3 <- batsmanDismissals(devilliers, "De Villiers")
p4 <- batsmanDismissals(deKock,"Q de Kock")
p5 <- batsmanDismissals(root,"JE Root")
p6 <- batsmanDismissals(guptill,"MJ Guptill")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

dismissal-1

6. Runs versus Strike Rate

De villiers has the best strike rate among all as there are more points to the right side of the plot for the same runs. Kohli and Dhoni do well too. Q De Kock and Joe Root also have a very good spread of points though they have fewer innings.

p1 <-batsmanRunsVsStrikeRate(kohli,"Kohli")
p2 <- batsmanRunsVsStrikeRate(dhoni, "Dhoni")
p3 <- batsmanRunsVsStrikeRate(devilliers, "De Villiers")
p4 <- batsmanRunsVsStrikeRate(deKock,"Q de Kock")
p5 <- batsmanRunsVsStrikeRate(root,"JE Root")
p6 <- batsmanRunsVsStrikeRate(guptill,"MJ Guptill")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

runsSR-1

7. Batsman moving average

Kohli’s average is on a gentle increase from below 50 to around 60’s. Joe Root performance is impressive with his moving average of late tending towards the 70’s. Q De Kock seemed to have a slump around 2015 but his performance is on the increase. Devilliers consistently averages around 50. Dhoni also has been having a stable run in the last several years.

p1 <-batsmanMovingAverage(kohli,"Kohli")
p2 <- batsmanMovingAverage(dhoni, "Dhoni")
p3 <- batsmanMovingAverage(devilliers, "De Villiers")
p4 <- batsmanMovingAverage(deKock,"Q de Kock")
p5 <- batsmanMovingAverage(root,"JE Root")
p6 <- batsmanMovingAverage(guptill,"MJ Guptill")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

ma-1

8. Batsman cumulative average

The functions below provide the cumulative average of runs scored. As can be seen Kohli and Devilliers have a cumulative runs rate that averages around 48-50. Q De Kock seems to have had a rocky career with several highs and lows as the cumulative average oscillates between 45-40. Root steadily improves to a cumulative average of around 42-43 from his 50th innings

p1 <-batsmanCumulativeAverageRuns(kohli,"Kohli")
p2 <- batsmanCumulativeAverageRuns(dhoni, "Dhoni")
p3 <- batsmanCumulativeAverageRuns(devilliers, "De Villiers")
p4 <- batsmanCumulativeAverageRuns(deKock,"Q de Kock")
p5 <- batsmanCumulativeAverageRuns(root,"JE Root")
p6 <- batsmanCumulativeAverageRuns(guptill,"MJ Guptill")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

cAvg-1

9. Cumulative Average Strike Rate

The plots below show the cumulative average strike rate of the batsmen. Dhoni and Devilliers have the best cumulative average strike rate of 90%. The rest average around 80% strike rate. Guptill shows a slump towards the latter part of his career.

p1 <-batsmanCumulativeStrikeRate(kohli,"Kohli")
p2 <- batsmanCumulativeStrikeRate(dhoni, "Dhoni")
p3 <- batsmanCumulativeStrikeRate(devilliers, "De Villiers")
p4 <- batsmanCumulativeStrikeRate(deKock,"Q de Kock")
p5 <- batsmanCumulativeStrikeRate(root,"JE Root")
p6 <- batsmanCumulativeStrikeRate(guptill,"MJ Guptill")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

cSR-1

10. Batsman runs against opposition

Kohli’s best performances are against Australia, West Indies and Sri Lanka

batsmanRunsAgainstOpposition(kohli,"Kohli")

runsOppn1-1

batsmanRunsAgainstOpposition(dhoni, "Dhoni")

runsOppn2-1

Kohli’s best performances are against Australia, Pakistan and West Indies

batsmanRunsAgainstOpposition(devilliers, "De Villiers")

runsOppn3-1

Quentin de Kock average almost 100 runs against India and 75 runs against England

batsmanRunsAgainstOpposition(deKock, "Q de Kock")

runsOppn4-1

Root’s best performances are against South Africa, Sri Lanka and West Indies

batsmanRunsAgainstOpposition(root, "JE Root")

runsOppn5-1

batsmanRunsAgainstOpposition(guptill, "MJ Guptill")

runsOppn6-1

11. Runs at different venues

The plots below give the performances of the batsmen at different grounds.

batsmanRunsVenue(kohli,"Kohli")

runsVenue1-1

batsmanRunsVenue(dhoni, "Dhoni")

runsVenue2-1

batsmanRunsVenue(devilliers, "De Villiers")

runsVenue3-1

batsmanRunsVenue(deKock, "Q de Kock")

runsVenue4-1

batsmanRunsVenue(root, "JE Root")

runsVenue5-1

batsmanRunsVenue(guptill, "MJ Guptill")

runsVenue6-1

12. Predict number of runs to deliveries

The plots below use rpart classification tree to predict the number of deliveries required to score the runs in the leaf node. For e.g. Kohli takes 66 deliveries to score 64 runs and for higher number of deliveries scores around 115 runs. Devilliers needs

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsPredict(kohli,"Kohli")
batsmanRunsPredict(dhoni, "Dhoni")
batsmanRunsPredict(devilliers, "De Villiers")

runsPredict1,runsVenue1-1

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsPredict(deKock,"Q de Kock")
batsmanRunsPredict(root,"JE Root")
batsmanRunsPredict(guptill,"MJ Guptill")

runsPredict2,runsVenue1-1

B. Bowler functions

13. Get bowling details

The function below gets the overall team bowling details based on the RData file available in ODI matches. This is currently also available in Github at (https://github.com/tvganesh/yorkrData/tree/master/ODI/ODI-matches). The bowling details of the team in each match is created and a huge data frame is created by rbinding the individual dataframes. This can be saved as a RData file

setwd("C:/software/cricket-package/york-test/yorkrData/ODI/ODI-matches")
ind_bowling <- getTeamBowlingDetails("India",dir=".",save=TRUE)
dim(ind_bowling)
## [1] 7816   12
aus_bowling <- getTeamBowlingDetails("Australia",dir=".",save=TRUE)
dim(aus_bowling)
## [1] 9191   12
ban_bowling <- getTeamBowlingDetails("Bangladesh",dir=".",save=TRUE)
dim(ban_bowling)
## [1] 5665   12
sa_bowling <- getTeamBowlingDetails("South Africa",dir=".",save=TRUE)
dim(sa_bowling)
## [1] 3806   12
sl_bowling <- getTeamBowlingDetails("Sri Lanka",dir=".",save=TRUE)
dim(sl_bowling)
## [1] 3964   12

14. Get bowling details of the individual bowlers

This function is used to get the individual bowling record for a specified bowler of the country as in the functions below. For analyzing the bowling performances the following cricketers have been chosen

  1. R A Jadeja (Ind)
  2. Ravichander Ashwin (Ind)
  3. Mitchell Starc (Aus)
  4. Shakib Al Hasan (Ban)
  5. Ajantha Mendis (SL)
  6. Dale Steyn (SA)
jadeja <- getBowlerWicketDetails(team="India",name="Jadeja",dir=".")
ashwin <- getBowlerWicketDetails(team="India",name="Ashwin",dir=".")
starc <-  getBowlerWicketDetails(team="Australia",name="Starc",dir=".")
shakib <-  getBowlerWicketDetails(team="Bangladesh",name="Shakib",dir=".")
mendis <-  getBowlerWicketDetails(team="Sri Lanka",name="Mendis",dir=".")
steyn <-  getBowlerWicketDetails(team="South Africa",name="Steyn",dir=".")

15. Bowler Mean Economy Rate

Shakib Al Hassan is expensive in the 1st 3 overs after which he is very economical with a economy rate of 3-4. Starc, Steyn average around a ER of 4.0

p1<-bowlerMeanEconomyRate(jadeja,"RA Jadeja")
p2<-bowlerMeanEconomyRate(ashwin, "R Ashwin")
p3<-bowlerMeanEconomyRate(starc, "MA Starc")
p4<-bowlerMeanEconomyRate(shakib, "Shakib Al Hasan")
p5<-bowlerMeanEconomyRate(mendis, "A Mendis")
p6<-bowlerMeanEconomyRate(steyn, "D Steyn")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

meanER-1

16. Bowler Mean Runs conceded

Ashwin is expensive around 6 & 7 overs

p1<-bowlerMeanRunsConceded(jadeja,"RA Jadeja")
p2<-bowlerMeanRunsConceded(ashwin, "R Ashwin")
p3<-bowlerMeanRunsConceded(starc, "M A Starc")
p4<-bowlerMeanRunsConceded(shakib, "Shakib Al Hasan")
p5<-bowlerMeanRunsConceded(mendis, "A Mendis")
p6<-bowlerMeanRunsConceded(steyn, "D Steyn")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

meanRunsConceded-1

17. Bowler Moving average

RA jadeja and Mendis’ performance has dipped considerably, while Ashwin and Shakib have improving performances. Starc average around 4 wickets

p1<-bowlerMovingAverage(jadeja,"RA Jadeja")
p2<-bowlerMovingAverage(ashwin, "Ashwin")
p3<-bowlerMovingAverage(starc, "M A Starc")
p4<-bowlerMovingAverage(shakib, "Shakib Al Hasan")
p5<-bowlerMovingAverage(mendis, "Ajantha Mendis")
p6<-bowlerMovingAverage(steyn, "Dale Steyn")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

bowlerMA-1

17. Bowler cumulative average wickets

Starc is clearly the most consistent performer with 3 wickets on an average over his career, while Jadeja averages around 2.0. Ashwin seems to have dropped from 2.4-2.0 wickets, while Mendis drops from high 3.5 to 2.2 wickets. The fractional wickets only show a tendency to take another wicket.

p1<-bowlerCumulativeAvgWickets(jadeja,"RA Jadeja")
p2<-bowlerCumulativeAvgWickets(ashwin, "Ashwin")
p3<-bowlerCumulativeAvgWickets(starc, "M A Starc")
p4<-bowlerCumulativeAvgWickets(shakib, "Shakib Al Hasan")
p5<-bowlerCumulativeAvgWickets(mendis, "Ajantha Mendis")
p6<-bowlerCumulativeAvgWickets(steyn, "Dale Steyn")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

cumWkts-1

18. Bowler cumulative Economy Rate (ER)

The plots below are interesting. All of the bowlers seem to average around 4.5 runs/over. RA Jadeja’s ER improves and heads to 4.5, Mendis is seen to getting more expensive as his career progresses. From a ER of 3.0 he increases towards 4.5

p1<-bowlerCumulativeAvgEconRate(jadeja,"RA Jadeja")
p2<-bowlerCumulativeAvgEconRate(ashwin, "Ashwin")
p3<-bowlerCumulativeAvgEconRate(starc, "M A Starc")
p4<-bowlerCumulativeAvgEconRate(shakib, "Shakib Al Hasan")
p5<-bowlerCumulativeAvgEconRate(mendis, "Ajantha Mendis")
p6<-bowlerCumulativeAvgEconRate(steyn, "Dale Steyn")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

cumER-1

19. Bowler wicket plot

The plot below gives the average wickets versus number of overs

p1<-bowlerWicketPlot(jadeja,"RA Jadeja")
p2<-bowlerWicketPlot(ashwin, "Ashwin")
p3<-bowlerWicketPlot(starc, "M A Starc")
p4<-bowlerWicketPlot(shakib, "Shakib Al Hasan")
p5<-bowlerWicketPlot(mendis, "Ajantha Mendis")
p6<-bowlerWicketPlot(steyn, "Dale Steyn")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

wktPlot-1

20. Bowler wicket against opposition

#Jadeja's' best pertformance are against England, Pakistan and West Indies
bowlerWicketsAgainstOpposition(jadeja,"RA Jadeja")

wktsOppn1-1

#Ashwin's bets pertformance are against England, Pakistan and South Africa
bowlerWicketsAgainstOpposition(ashwin, "Ashwin")

wktsOppn2-1

#Starc has good performances against India, New Zealand, Pakistan, West Indies
bowlerWicketsAgainstOpposition(starc, "M A Starc")

wktsOppn3-1

bowlerWicketsAgainstOpposition(shakib,"Shakib Al Hasan")

wktsOppn4-1

bowlerWicketsAgainstOpposition(mendis, "Ajantha Mendis")

wktsOppn5-1

#Steyn has good performances against India, Sri Lanka, Pakistan, West Indies
bowlerWicketsAgainstOpposition(steyn, "Dale Steyn")

wktsOppn6-1

21. Bowler wicket at cricket grounds

bowlerWicketsVenue(jadeja,"RA Jadeja")

wktsAve1-1

bowlerWicketsVenue(ashwin, "Ashwin")

wktsAve2-1

bowlerWicketsVenue(starc, "M A Starc")
## Warning: Removed 2 rows containing missing values (geom_bar).

wktsAve3-1

bowlerWicketsVenue(shakib,"Shakib Al Hasan")

wktsAve4-1

bowlerWicketsVenue(mendis, "Ajantha Mendis")

wktsAve5-1

bowlerWicketsVenue(steyn, "Dale Steyn")

wktsAve6-1

22. Get Delivery wickets for bowlers

Thsi function creates a dataframe of deliveries and the wickets taken

setwd("C:/software/cricket-package/york-test/yorkrData/ODI/ODI-matches")
jadeja1 <- getDeliveryWickets(team="India",dir=".",name="Jadeja",save=FALSE)
ashwin1 <- getDeliveryWickets(team="India",dir=".",name="Ashwin",save=FALSE)
starc1 <- getDeliveryWickets(team="Australia",dir=".",name="MA Starc",save=FALSE)
shakib1 <- getDeliveryWickets(team="Bangladesh",dir=".",name="Shakib",save=FALSE)
mendis1 <- getDeliveryWickets(team="Sri Lanka",dir=".",name="Mendis",save=FALSE)
steyn1 <- getDeliveryWickets(team="South Africa",dir=".",name="Steyn",save=FALSE)

23. Predict number of deliveries to wickets

#Jadeja and Ashwin need around 22 to 28 deliveries to make a break through
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktsPredict(jadeja1,"RA Jadeja")
bowlerWktsPredict(ashwin1,"RAshwin")

wktsPred1-1

#Starc and Shakib provide an early breakthrough producing a wicket in around 16 balls. Starc's 2nd wicket comed around the 30th delivery
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktsPredict(starc1,"MA Starc")
bowlerWktsPredict(shakib1,"Shakib Al Hasan")

wktsPred2-1

#Steyn and Mendis take 20 deliveries to get their 1st wicket
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktsPredict(mendis1,"A Mendis")
bowlerWktsPredict(steyn1,"DSteyn")

wktsPred3-1

Conclusion

This concludes the 4 part introduction to my new R cricket package yorkr for ODIs. I will be enhancing the package to handle Twenty20 and IPL matches soon. You can fork/clone the code from Github at yorkr.

The yaml data from Cricsheet have already beeen converted into R consumable dataframes. The converted data can be downloaded from Github at yorkrData. There are 3 folders – ODI matches, ODI matches between 2 teams (oppnAllMatches), ODI matches between a team and the rest of the world (all matches,all oppositions).

As I have already mentioned I have around 67 functions for analysis, however I am certain that the data has a lot more secrets waiting to be tapped. So please do go ahead and run any machine learning or statistical learning algorithms on them. If you do come up with interesting insights, I would appreciate if attribute the source to Cricsheet(http://cricsheet.org), and my package yorkr and my blog Giga thoughts*, besides dropping me a note.

Hope you have a great time with my yorkr package!

Also see

  1. Introducing cricketr! : An R package to analyze performances of cricketers
  2. Cricket analytics with cricketr in paperback and Kindle versions
  3. My TEDx talk on the “Internet of Things”
  4. Bend it like Bluemix,MongoDB with autoscaling – Part 1
  5. The mind of a programmer
  6. Fun simulation of a chain in Android
  7. Taking cricketr for a spin-Part 1
  8. Latency,throughput implications for the cloud
  9. Hand detection through haar-training: A hands-on approach
  10. Cricket analytics with cricketr

Introducing cricket package yorkr: Part 2-Trapped leg before wicket!


“It was a puzzling thing. The truth knocks on the door and you say ‘Go away, I ’m looking for the truth,’ and so it goes away. Puzzling.”

“But even though Quality cannot be defined, you know what Quality is!”

“The Buddha, the Godhead, resides quite comfortably in the circuits of a digital computer or the gears of a cycle transmission as he does at the top of a mountain or in the petals of the flower. To think otherwise is to demean the Buddha – which is to demean oneself.”

                Zen and the Art of Motorcycle maintenance - Robert M Pirsig

Introduction

If we were to to extend the last quote from Zen and the Art of Motorcycle Maintenance, by Robert M Pirsig, I think it would be fair to say that the Buddha also comfortably resides in the exquisite backhand cross-court return of Bjorn Borg, to the the graceful arc of the football in a Lionel Messi’s free kick to the smashing cover drive of Sunil Gavaskar.

In this post I continue to introduce my latest cricket package yorkr. This post is a continuation of my earlier post – Introducing cricket package yorkr-Part1:Beaten by sheer pace!. This post deals with Class 2 functions namely the performances of a team in all matches against a single opposition for e.g all matches of India-Australia, Pakistan-West Indies etc. You can clone/fork the code for my package yorkr from Github at yorkr

Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today!

Note 1: The package currently only supports ODI, T20s and IPL T20 matches.

This post has also been published at RPubs yorkr-Part2 and can also be downloaded as a PDF document from yorkr-Part2.pdf

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

The list of function in Class 2 are

  1. teamBatsmenPartnershiOppnAllMatches()
  2. teamBatsmenPartnershipOppnAllMatchesChart()
  3. teamBatsmenVsBowlersOppnAllMatches()
  4. teamBattingScorecardOppnAllMatches()
  5. teamBowlingPerfOppnAllMatches()
  6. teamBowlersWicketsOppnAllMatches()
  7. teamBowlersVsBatsmenOppnAllMatches()
  8. teamBowlersWicketKindOppnAllMatches()
  9. teamBowlersWicketRunsOppnAllMatches()
  10. plotWinLossBetweenTeams()

1. Install the package from CRAN

if (!require("yorkr")) {
  install.packages("yorkr") 
  library("yorkr")
}
library(plotly) 
rm(list=ls())

2. Get data for all matches between 2 teams

We can get all matches between any 2 teams using the function below. The dir parameter should point to the folder which RData files of the individual matches. This function creates a data frame of all the matches and also saves the dataframe as RData

setwd("C:/software/cricket-package/york-test/yorkrData/ODI/ODI-matches")
matches <- getAllMatchesBetweenTeams("Australia","India",dir=".")
dim(matches)
## [1] 67428    25

I have however already saved the matches for all possible combination of opposing countries. The data for these matches for the individual teams/countries can be obtained from Github at in the folder ODI-allmatches-between-two-teams

Note: The dataframe for the different head-to-head matches can be loaded directly into your code. The datframes are 15000+ rows x 25 columns. While I have 10 functions to process the details between teams, feel free to let loose any statistical or machine learning algorithms on the dataframe. So go ahead with any insights that can be gleaned from random forests, ridge regression,SVM classifiers and so on. If you do come up with something interesting, I would appreciate if you could drop me a note. Also please do attribute source to Cricsheet (http://cricsheet.org), the package york and my blog Giga thoughts

3. Save data for all matches between all combination of 2 teams

This can be done locally using the function below. You could use this function to combine all matches between any 2 teams into a single dataframe and save it in the current folder. The current implementation expectes that the the RData files of individual matches are in ../data folder. Since I already have converted this I will not be running this again

#saveAllMatchesBetweenTeams()

4. Load data directly for all matches between 2 teams

As in my earlier post I pick all matches between 2 random teams. I load the data directly from the stored RData files. When we load the Rdata file a “matches” object will be created. This object can be stored for the apporpriate teams as below

setwd("C:/software/cricket-package/york-test/yorkrData/ODI/ODI-allmatches-between-two-teams")
load("India-Australia-allMatches.RData")
aus_ind_matches <- matches
dim(aus_ind_matches)
## [1] 21909    25
load("England-New Zealand-allMatches.RData")
eng_nz_matches <- matches
dim(eng_nz_matches)
## [1] 15343    25
load("Pakistan-South Africa-allMatches.RData")
pak_sa_matches <- matches
dim(pak_sa_matches)
## [1] 17083    25
load("Sri Lanka-West Indies-allMatches.RData")
sl_wi_matches <- matches
dim(sl_wi_matches)
## [1] 4869   25
load("Bangladesh-Ireland-allMatches.RData")
ban_ire_matches <-matches
dim(ban_ire_matches)
## [1] 1668   25
load("Kenya-Bermuda-allMatches.RData")
ken_ber_matches <- matches
dim(ken_ber_matches)
## [1] 1518   25
load("Scotland-Canada-allMatches.RData")
sco_can_matches <-matches
dim(sco_can_matches)
## [1] 1061   25
load("Netherlands-Afghanistan-allMatches.RData")
nl_afg_matches <- matches
dim(nl_afg_matches)
## [1] 402  25

5. Team Batsmen partnership (all matches with opposition)

This function will create a report of the batting partnerships in the teams. The report can be brief or detailed depending on the parameter ‘report’. The top batsmen in India-Australia clashes are Ricky Ponting from Australia and Mahendra Singh Dhoni of India.

m<- teamBatsmenPartnershiOppnAllMatches(aus_ind_matches,'Australia',report="summary")
m
## Source: local data frame [47 x 2]
## 
##       batsman totalRuns
##        (fctr)     (dbl)
## 1  RT Ponting       876
## 2  MEK Hussey       753
## 3   GJ Bailey       614
## 4   SR Watson       609
## 5   MJ Clarke       607
## 6   ML Hayden       573
## 7   A Symonds       536
## 8    AJ Finch       525
## 9   SPD Smith       467
## 10  DA Warner       391
## ..        ...       ...
m <-teamBatsmenPartnershiOppnAllMatches(aus_ind_matches,'India',report="summary")
m
## Source: local data frame [44 x 2]
## 
##         batsman totalRuns
##          (fctr)     (dbl)
## 1      MS Dhoni      1156
## 2     RG Sharma       918
## 3  SR Tendulkar       910
## 4       V Kohli       902
## 5     G Gambhir       536
## 6  Yuvraj Singh       524
## 7      SK Raina       509
## 8      S Dhawan       471
## 9      V Sehwag       289
## 10   RV Uthappa       283
## ..          ...       ...
m <-teamBatsmenPartnershiOppnAllMatches(aus_ind_matches,'Australia',report="detailed")
m <-teamBatsmenPartnershiOppnAllMatches(pak_sa_matches,'Pakistan',report="summary")
m
## Source: local data frame [40 x 2]
## 
##            batsman totalRuns
##             (fctr)     (dbl)
## 1    Misbah-ul-Haq       727
## 2      Younis Khan       657
## 3    Shahid Afridi       558
## 4  Mohammad Yousuf       539
## 5  Mohammad Hafeez       477
## 6     Shoaib Malik       452
## 7    Ahmed Shehzad       348
## 8     Abdul Razzaq       246
## 9     Kamran Akmal       241
## 10      Umar Akmal       215
## ..             ...       ...
m <-teamBatsmenPartnershiOppnAllMatches(eng_nz_matches,'England',report="summary")
m
## Source: local data frame [47 x 2]
## 
##           batsman totalRuns
##            (fctr)     (dbl)
## 1         IR Bell       654
## 2         JE Root       612
## 3  PD Collingwood       514
## 4      EJG Morgan       479
## 5         AN Cook       464
## 6       IJL Trott       362
## 7    KP Pietersen       358
## 8      JC Buttler       287
## 9         OA Shah       274
## 10      RS Bopara       222
## ..            ...       ...
m <-teamBatsmenPartnershiOppnAllMatches(sl_wi_matches,'Sri Lanka',report="summary")
m[1:50,]
## Source: local data frame [50 x 2]
## 
##             batsman totalRuns
##              (fctr)     (dbl)
## 1  DPMD Jayawardene       288
## 2     KC Sangakkara       238
## 3        TM Dilshan       224
## 4       WU Tharanga       220
## 5        AD Mathews       161
## 6     ST Jayasuriya       160
## 7       ML Udawatte        87
## 8   HDRL Thirimanne        67
## 9       MDKJ Perera        64
## 10    CK Kapugedera        57
## ..              ...       ...
m <- teamBatsmenPartnershiOppnAllMatches(ban_ire_matches,"Ireland",report="summary")
m
## Source: local data frame [16 x 2]
## 
##             batsman totalRuns
##              (fctr)     (dbl)
## 1   WTS Porterfield       111
## 2        KJ O'Brien        99
## 3        NJ O'Brien        75
## 4         GC Wilson        60
## 5          AR White        38
## 6       DT Johnston        36
## 7           JP Bray        31
## 8         JF Mooney        28
## 9          AC Botha        23
## 10         EC Joyce        16
## 11      PR Stirling        15
## 12      GH Dockrell         9
## 13        WB Rankin         9
## 14 D Langford-Smith         6
## 15       EJG Morgan         5
## 16        AR Cusack         0

6. Team batsmen partnership (all matches with opposition)

This is plotted graphically in the charts below

teamBatsmenPartnershipOppnAllMatchesChart(aus_ind_matches,"India","Australia")

teamBatsmenPartnership-1

teamBatsmenPartnershipOppnAllMatchesChart(pak_sa_matches,main="South Africa",opposition="Pakistan")

teamBatsmenPartnership-2

m<- teamBatsmenPartnershipOppnAllMatchesChart(eng_nz_matches,"New Zealand",opposition="England",plot=FALSE)
m[1:30,]
##          batsman    nonStriker runs
## 1  KS Williamson   LRPL Taylor  354
## 2    BB McCullum    MJ Guptill  275
## 3    LRPL Taylor KS Williamson  273
## 4     MJ Guptill   BB McCullum  227
## 5    BB McCullum      JD Ryder  212
## 6     MJ Guptill KS Williamson  196
## 7  KS Williamson    MJ Guptill  179
## 8       JD Ryder   BB McCullum  175
## 9       JDP Oram     SB Styris  153
## 10   LRPL Taylor    GD Elliott  147
## 11    GD Elliott   LRPL Taylor  143
## 12   LRPL Taylor    MJ Guptill  140
## 13        JM How   BB McCullum  128
## 14    MJ Guptill   LRPL Taylor  125
## 15   BB McCullum        JM How  117
## 16   BB McCullum   LRPL Taylor  116
## 17     SB Styris      JDP Oram  100
## 18   LRPL Taylor        JM How   98
## 19        JM How   LRPL Taylor   98
## 20      JDP Oram   BB McCullum   84
## 21   LRPL Taylor     L Vincent   71
## 22      JDP Oram    DL Vettori   70
## 23   LRPL Taylor   BB McCullum   61
## 24     SB Styris        JM How   55
## 25      DR Flynn     SB Styris   54
## 26    DL Vettori      JDP Oram   53
## 27     L Vincent   LRPL Taylor   53
## 28    MJ Santner   LRPL Taylor   53
## 29    SP Fleming     L Vincent   52
## 30        JM How     SB Styris   50
teamBatsmenPartnershipOppnAllMatchesChart(sl_wi_matches,"Sri Lanka","West Indies")

teamBatsmenPartnership-3

teamBatsmenPartnershipOppnAllMatchesChart(ban_ire_matches,"Bangladesh","Ireland")

teamBatsmenPartnership-4

7. Team batsmen versus bowler (all matches with opposition)

The plots below provide information on how each of the top batsmen fared against the opposition bowlers

teamBatsmenVsBowlersOppnAllMatches(aus_ind_matches,"India","Australia")

batsmenvsBowler-1

teamBatsmenVsBowlersOppnAllMatches(pak_sa_matches,"South Africa","Pakistan",top=3)

batsmenvsBowler-2

m <- teamBatsmenVsBowlersOppnAllMatches(eng_nz_matches,"England","New Zealnd",top=10,plot=FALSE)
m
## Source: local data frame [157 x 3]
## Groups: batsman [1]
## 
##    batsman       bowler  runs
##     (fctr)       (fctr) (dbl)
## 1  IR Bell JEC Franklin    63
## 2  IR Bell      SE Bond    13
## 3  IR Bell MR Gillespie    33
## 4  IR Bell     NJ Astle     0
## 5  IR Bell     JS Patel    20
## 6  IR Bell   DL Vettori    28
## 7  IR Bell     JDP Oram    48
## 8  IR Bell    SB Styris    12
## 9  IR Bell     KD Mills   124
## 10 IR Bell   TG Southee    84
## ..     ...          ...   ...
teamBatsmenVsBowlersOppnAllMatches(sl_wi_matches,"Sri Lanka","West Indies")

batsmenvsBowler-3

teamBatsmenVsBowlersOppnAllMatches(ban_ire_matches,"Bangladesh","Ireland")

batsmenvsBowler-4

8. Team batsmen versus bowler (all matches with opposition)

The following tables gives the overall performances of the country’s batsmen against the opposition. For India-Australia matches Dhoni, Rohit Sharma and Tendulkar lead the way. For Australia it is Ricky Ponting, M Hussey and GJ Bailey. In South Africa- Pakistan matches it is AB Devilliers, Hashim Amla etc.

a <-teamBattingScorecardOppnAllMatches(aus_ind_matches,main="India",opposition="Australia")
## Total= 8331
a
## Source: local data frame [44 x 5]
## 
##         batsman ballsPlayed fours sixes  runs
##          (fctr)       (int) (int) (int) (dbl)
## 1      MS Dhoni        1406    78    22  1156
## 2     RG Sharma        1015    73    24   918
## 3  SR Tendulkar        1157   103     6   910
## 4       V Kohli         961    87     6   902
## 5     G Gambhir         677    44     2   536
## 6  Yuvraj Singh         664    52    11   524
## 7      SK Raina         536    43    11   509
## 8      S Dhawan         470    55     6   471
## 9      V Sehwag         305    42     4   289
## 10   RV Uthappa         295    29     7   283
## ..          ...         ...   ...   ...   ...
teamBattingScorecardOppnAllMatches(aus_ind_matches,"Australia","India")
## Total= 9995
## Source: local data frame [47 x 5]
## 
##       batsman ballsPlayed fours sixes  runs
##        (fctr)       (int) (int) (int) (dbl)
## 1  RT Ponting        1107    86     8   876
## 2  MEK Hussey         816    56     5   753
## 3   GJ Bailey         578    51    13   614
## 4   SR Watson         653    81    10   609
## 5   MJ Clarke         786    45     5   607
## 6   ML Hayden         660    72     8   573
## 7   A Symonds         543    43    15   536
## 8    AJ Finch         617    52     9   525
## 9   SPD Smith         431    44     7   467
## 10  DA Warner         385    40     6   391
## ..        ...         ...   ...   ...   ...
teamBattingScorecardOppnAllMatches(pak_sa_matches,"South Africa","Pakistan")
## Total= 6657
## Source: local data frame [36 x 5]
## 
##           batsman ballsPlayed fours sixes  runs
##            (fctr)       (int) (int) (int) (dbl)
## 1  AB de Villiers        1533   128    23  1423
## 2         HM Amla         864    88     3   815
## 3        GC Smith         726    68     3   597
## 4       JH Kallis         710    40     8   543
## 5       JP Duminy         620    35     3   481
## 6       CA Ingram         388    32     1   305
## 7    F du Plessis         363    30     4   278
## 8       Q de Kock         336    28     2   270
## 9       DA Miller         329    20     2   250
## 10       HH Gibbs         252    33     2   228
## ..            ...         ...   ...   ...   ...
teamBattingScorecardOppnAllMatches(sl_wi_matches,"West Indies","Sri Lanka")
## Total= 1800
## Source: local data frame [36 x 5]
## 
##          batsman ballsPlayed fours sixes  runs
##           (fctr)       (int) (int) (int) (dbl)
## 1       DM Bravo         353    20     6   265
## 2      RR Sarwan         315    11     3   205
## 3     MN Samuels         209    19     5   188
## 4       CH Gayle         198    18     8   176
## 5  S Chanderpaul         181     6     7   152
## 6      AB Barath         162     9     2   125
## 7       DJ Bravo         139     7     2   102
## 8       CS Baugh         102     5    NA    78
## 9    LMP Simmons          78     5     4    67
## 10     JO Holder          33     5     3    55
## ..           ...         ...   ...   ...   ...
teamBattingScorecardOppnAllMatches(eng_nz_matches,"England","New Zealand")
## Total= 6472
## Source: local data frame [47 x 5]
## 
##           batsman ballsPlayed fours sixes  runs
##            (fctr)       (int) (int) (int) (dbl)
## 1         IR Bell         871    74     7   654
## 2         JE Root         651    54     5   612
## 3  PD Collingwood         619    34    15   514
## 4      EJG Morgan         445    35    22   479
## 5         AN Cook         616    49     3   464
## 6       IJL Trott         421    26     1   362
## 7    KP Pietersen         481    30     6   358
## 8      JC Buttler         199    28    11   287
## 9         OA Shah         323    17     6   274
## 10      RS Bopara         350    21    NA   222
## ..            ...         ...   ...   ...   ...
teamBatsmenPartnershiOppnAllMatches(sco_can_matches,"Scotland","Canada")
## Source: local data frame [20 x 2]
## 
##          batsman totalRuns
##           (fctr)     (dbl)
## 1     CS MacLeod       177
## 2      MW Machan        68
## 3      CJO Smith        43
## 4    FRJ Coleman        40
## 5      RR Watson        14
## 6     JH Stander        12
## 7       MA Leask        12
## 8     RML Taylor        10
## 9     KJ Coetzer         8
## 10   GM Hamilton         7
## 11        RM Haq         7
## 12    PL Mommsen         6
## 13     CM Wright         5
## 14        JD Nel         5
## 15      MH Cross         4
## 16     SM Sharif         4
## 17     JAR Blain         2
## 18  NFI McCallum         1
## 19 RD Berrington         1
## 20     NS Poonia         0

9. Team performances of bowlers (all matches with opposition)

Like the function above the following tables provide the top bowlers of the countries in the matches against the oppoition. In India-Australia matches Ishant Sharma leads, in Pakistan-South Africa matches Shahid Afridi tops and so on.

teamBowlingPerfOppnAllMatches(aus_ind_matches,"India","Australia")
## Source: local data frame [36 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1         I Sharma    44       1   739      20
## 2  Harbhajan Singh    40       0   926      15
## 3        RA Jadeja    39       0   867      14
## 4        IK Pathan    42       1   702      11
## 5         UT Yadav    37       2   606      10
## 6          P Kumar    27       0   501      10
## 7           Z Khan    33       1   500      10
## 8      S Sreesanth    34       0   454      10
## 9         R Ashwin    43       0   684       9
## 10   R Vinay Kumar    31       1   380       9
## ..             ...   ...     ...   ...     ...
teamBowlingPerfOppnAllMatches(pak_sa_matches,main="Pakistan",opposition="South Africa")
## Source: local data frame [24 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1    Shahid Afridi    38       0  1053      17
## 2      Saeed Ajmal    39       0   658      14
## 3  Mohammad Hafeez    38       0   774      13
## 4   Mohammad Irfan    29       0   467      13
## 5   Iftikhar Anjum    29       1   257      12
## 6       Wahab Riaz    31       0   534      11
## 7      Junaid Khan    32       0   429      10
## 8    Sohail Tanvir    26       1   409       9
## 9    Shoaib Akhtar    22       1   313       9
## 10        Umar Gul    25       2   365       7
## ..             ...   ...     ...   ...     ...
teamBowlingPerfOppnAllMatches(eng_nz_matches,"New Zealand","England")
## Source: local data frame [33 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (dbl)
## 1      TG Southee    40       0   684      19
## 2        KD Mills    36       1   742      17
## 3      DL Vettori    35       0   561      16
## 4  MJ McClenaghan    34       0   515      14
## 5         SE Bond    17       1   205      11
## 6      GD Elliott    20       0   194      10
## 7    JEC Franklin    24       0   418       7
## 8   KS Williamson    21       1   225       7
## 9        TA Boult    18       2   195       7
## 10    NL McCullum    30       0   425       6
## ..            ...   ...     ...   ...     ...
teamBowlingPerfOppnAllMatches(sl_wi_matches,"Sri Lanka","West Indies")
## Source: local data frame [24 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1       SL Malinga    28       1   280      11
## 2       BAW Mendis    15       0   267       8
## 3  KMDN Kulasekara    13       1   185       7
## 4       AD Mathews    14       0   191       6
## 5   M Muralitharan    20       1   157       6
## 6      MF Maharoof     9       2    14       6
## 7       WPUJC Vaas     7       2    82       5
## 8       RAS Lakmal     7       0    55       4
## 9    ST Jayasuriya     1       0    38       4
## 10    HMRKB Herath    10       1   124       3
## ..             ...   ...     ...   ...     ...
teamBowlingPerfOppnAllMatches(ken_ber_matches,"Kenya","Bermuda")
## Source: local data frame [9 x 5]
## 
##        bowler overs maidens  runs wickets
##        (fctr) (int)   (int) (dbl)   (dbl)
## 1  JK Kamande    16       0   122       5
## 2  HA Varaiya    13       1    64       5
## 3   AS Luseno     6       0    32       4
## 4  PJ Ongondo     7       0    39       3
## 5    TM Odoyo     7       0    36       3
## 6  LN Onyango     7       0    37       2
## 7   SO Tikolo    18       0    81       1
## 8 NN Odhiambo    14       1    76       1
## 9    CO Obuya     4       0    20       0

10. Team bowler’s wickets (all matches with opposition)

This provided a graphical plot of the tables above

teamBowlersWicketsOppnAllMatches(aus_ind_matches,"India","Australia")

bowlerWicketsOppn-1

teamBowlersWicketsOppnAllMatches(aus_ind_matches,"Australia","India")

bowlerWicketsOppn-2

teamBowlersWicketsOppnAllMatches(pak_sa_matches,"South Africa","Pakistan",top=10)

bowlerWicketsOppn-3

m <-teamBowlersWicketsOppnAllMatches(eng_nz_matches,"England","Zealand",plot=FALSE)
m
## Source: local data frame [20 x 2]
## 
##            bowler wickets
##            (fctr)   (int)
## 1     JM Anderson      20
## 2       SCJ Broad      13
## 3         ST Finn      12
## 4  PD Collingwood      11
## 5        GP Swann      10
## 6   RJ Sidebottom       8
## 7       CR Woakes       8
## 8      A Flintoff       7
## 9     LE Plunkett       6
## 10      AU Rashid       6
## 11      BA Stokes       6
## 12     MS Panesar       5
## 13      LJ Wright       4
## 14     TT Bresnan       4
## 15      DJ Willey       4
## 16    JC Tredwell       3
## 17    CT Tremlett       2
## 18      RS Bopara       2
## 19      CJ Jordan       2
## 20        J Lewis       1
teamBowlersWicketsOppnAllMatches(ban_ire_matches,"Bangladesh","Ireland",top=7)

bowlerWicketsOppn-4

11. Team bowler vs batsmen (all matches with opposition)

These plots show how the bowlers fared against the batsmen. It shows which of the opposing teams batsmen were able to score the most runs

teamBowlersVsBatsmenOppnAllMatches(aus_ind_matches,'India',"Australia",top=5)

bowlerVsBatsmen-1

teamBowlersVsBatsmenOppnAllMatches(pak_sa_matches,"Pakistan","South Africa",top=3)

bowlerVsBatsmen-2

teamBowlersVsBatsmenOppnAllMatches(eng_nz_matches,"England","New Zealand")

bowlerVsBatsmen-3

teamBowlersVsBatsmenOppnAllMatches(eng_nz_matches,"New Zealand","England")

bowlerVsBatsmen-4

12. Team bowler’s wicket kind (caught,bowled,etc) (all matches with opposition)

The charts below show the wicket kind taken by the bowler (caught, bowled, lbw etc)

teamBowlersWicketKindOppnAllMatches(aus_ind_matches,"India","Australia",plot=TRUE)

bowlerWickets-1

m <- teamBowlersWicketKindOppnAllMatches(aus_ind_matches,"Australia","India",plot=FALSE)
m[1:30,]
##        bowler        wicketKind wicketPlayerOut runs
## 1  GD McGrath            caught    SR Tendulkar   69
## 2   SR Watson            caught        D Mongia  532
## 3  MG Johnson               lbw        V Sehwag 1020
## 4       B Lee            caught        R Dravid  671
## 5       B Lee            bowled          M Kaif  671
## 6  NW Bracken            caught        SK Raina  429
## 7  GD McGrath            caught       IK Pathan   69
## 8  NW Bracken               lbw        MS Dhoni  429
## 9  MG Johnson               lbw    SR Tendulkar 1020
## 10 MG Johnson            bowled       G Gambhir 1020
## 11   SR Clark            caught    SR Tendulkar  254
## 12   JR Hopes            caught    Yuvraj Singh  346
## 13   SR Clark               lbw      RV Uthappa  254
## 14    GB Hogg            caught        R Dravid  427
## 15  MJ Clarke           run out       IK Pathan  212
## 16  MJ Clarke           stumped Harbhajan Singh  212
## 17  MJ Clarke            bowled        RR Powar  212
## 18    GB Hogg            caught          Z Khan  427
## 19    GB Hogg            caught        MS Dhoni  427
## 20      B Lee               lbw       G Gambhir  671
## 21 MG Johnson               lbw      RV Uthappa 1020
## 22      B Lee            caught        R Dravid  671
## 23    GB Hogg            bowled    SR Tendulkar  427
## 24      B Lee            caught        MS Dhoni  671
## 25   JR Hopes            caught       RG Sharma  346
## 26    GB Hogg               lbw       IK Pathan  427
## 27 MG Johnson            bowled    Yuvraj Singh 1020
## 28    GB Hogg caught and bowled          Z Khan  427
## 29   SR Clark            bowled     S Sreesanth  254
## 30   JR Hopes            caught      SC Ganguly  346
teamBowlersWicketKindOppnAllMatches(sl_wi_matches,"Sri Lanka",'West Indies',plot=TRUE)

bowlerWickets-2

13. Team bowler’s wicket taken and runs conceded (all matches with opposition)

teamBowlersWicketRunsOppnAllMatches(aus_ind_matches,"India","Australia")

wicketRuns-1

m <-teamBowlersWicketRunsOppnAllMatches(pak_sa_matches,"Pakistan","South Africa",plot=FALSE)
m[1:30,]
## Source: local data frame [30 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1         Umar Gul    25       2   365       7
## 2   Iftikhar Anjum    29       1   257      12
## 3     Yasir Arafat     5       0    33       1
## 4     Abdul Razzaq    16       0   290       4
## 5  Mohammad Hafeez    38       0   774      13
## 6    Shahid Afridi    38       0  1053      17
## 7     Shoaib Malik    18       0   219       4
## 8    Sohail Tanvir    26       1   409       9
## 9     Abdur Rehman    25       0   301       4
## 10   Mohammad Asif    10       1   204       2
## ..             ...   ...     ...   ...     ...

14. Plot of wins vs losses between teams.

setwd("C:/software/cricket-package/york-test/yorkrData/ODI/ODI-matches")
plotWinLossBetweenTeams("India","Sri Lanka")

winsLosses-1

plotWinLossBetweenTeams('Pakistan',"South Africa",".")

winsLosses-2

plotWinLossBetweenTeams('England',"New Zealand",".")

winsLosses-3

plotWinLossBetweenTeams("Australia","West Indies",".")

winsLosses-4

plotWinLossBetweenTeams('Bangladesh',"Zimbabwe",".")

winsLosses-5

plotWinLossBetweenTeams('Scotland',"Ireland",".")

winsLosses-6

Conclusion

This post included all functions for all matches between any 2 opposing countries. As before the data frames are already available. You can load the data and begin to use them. If more insights from the dataframe are possible do go ahead. But please do attribute the source to Cricheet (http://cricsheet.org), my package yorkr and my blog. Do give the functions a spin for yourself.

There are 2 more posts required for the introduction of MY yorkr package.So, Hasta la vista, baby! I’ll be back!

Also see

You may also like

  1. Introducing cricketr! : An R package to analyze performances of cricketers
  2. Cricket analytics with cricketr
  3. cricketr adapts to the Twenty20 International!
  4. The making of Total Control Android game
  5. De-blurring revisited with Wiener filter using OpenCV
  6. Rock N’ Roll with Bluemix, Cloudant & NodeExpress

Cricket analytics with cricketr!!!


cricket

My ebook “Cricket analytics with cricketr’  has been published in Leanpub.  You can now download the book (hot off the press!)  for all formats to your favorite device (mobile, iPad, tablet, Kindle)  from the Leanpub  “Cricket analytics with cricketr”. The book has been published in the following formats namely

  • PDF (for your computer)
  • EPUB (for iPad or tablets. Save the file cricketr.epub to Google Drive/Dropbox and choose “Open in” iBooks for iPad)
  • MOBI (for Kindle. For this format, I suggest that you download & install SendToKindle for PC/Mac. You can then right click the downloaded cricketr.mobi and choose SendToKindle. You will need to login to your Kindle account)

From Leanpub
UntitledLeanpub uses a variable pricing model. I have priced the book attractively (I think!)  at $2.50 with a minimum price of $0.00 (FREE!!! limited time offer!).  The link is “Cricket analytics with cricketr

This format works with all type Kindle, Android tablet, iPad.

From Amazon
UntitledYou can also download for Kindle. The price is $2.50 (Rs 169/-). Cricket analytics with cricketr.

Do download the book and hope you have many happy hours reading it.

I am including my preface in the book below

Preface
Cricket has been the “national passion” of India for decades. As a boy I was also held in thrall by a strong cricketing passion like many. Cricket is a truly fascinating game! I would catch the sporting action with my friends as we crowded around a transistor that brought us live, breathless radio commentary. We also spent many hours glued to live cricket action on the early black and white TVs. This used to be an experience of sorts, as every now and then a part of the body of the players, would detach itself and stretch to the sides. But it was enjoyable all the same.

Nowadays broadcast technology has improved so much and we get detailed visual analysis of the how each bowler varies the swing and length of the delivery. We are also able to see the strokes of batsman in slow motion.   Similarly computing technology has also advanced by leaps and bounds and we can analyze players in great detail with a few lines of code in languages like R, Python etc.

In 2015, I completed Machine Learning from Stanford at Coursera.  I was looking around for data to play around with, when it suddenly struck me that I could do some regression analysis of batting records.  In the subsequent months, I took the Data Science Specialization from John Hopkins University, which triggered more ideas in me. One thing led to another and I managed to put together an R package called ‘cricketr’.  I developed this package over 7 months adding and refining functions. Finally, I managed to submit the package to CRAN.  During the development of the package for different formats of the game I wrote a series of posts in my blog.

This book is a collection of those cricket related posts.  There are 6 posts based on my R package cricketr. I have also included 2 earlier posts based on R which I wrote before I created my R package. Finally, I also include another 2 cricket posts based on Machine Learning in which I used the language Octave.

My cricketr’ package is a first, for cricket analytics, howzzat!  and I am certain that it won’t be the last. Cricket is a wonderful pitch for statisticians, data scientists and machine learning experts. So you can expect some cool packages in the years to come.

I had a great time developing the package. I hope you have a wonderful time reading this book. Do remember to download from “Cricket analytics with cricketr

Feel free to get in touch with me anytime through email included below

Tinniam V Ganesh
tvganesh.85@gmail.com
January 28, 2016

cricketr adapts to the Twenty20 International!


Introduction

This should be last in the series of posts based on my R package cricketr. That is, unless some bright idea comes trotting along and light bulbs go on around my head.

In this post cricketr adapts to the Twenty20 International format. Now cricketr can handle stats from all 3 formats of the game namely Test matches, ODIs and Twenty20 International from ESPN Cricinfo. You should be able to install the package from GitHub and use the many of the functions available in the package.

Please be mindful of the ESPN Cricinfo Terms of Use

You can also read this post at Rpubs as twenty20-cricketr. Download this report as a PDF file from twenty20-cricketr.pdf

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today!

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. Just a familiarity with R and R Markdown only is needed.

I have chosen the Top 4 batsmen and top 4 bowlers based on ICC rankings and/or number of matches played.

Batsmen

  1. Virat Kohli (Ind)
  2. Faf du Plessis (SA)
  3. A J Finch (Aus)
  4. Brendon McCullum (Aus)

Bowlers

  1. Samuel Badree (WI)
  2. Sunil Narine (WI)
  3. Ravichander Ashwin (Ind)
  4. Ajantha Mendis (SL)

I have explained the plots and added my own observations. Please feel free to draw your conclusions!

The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virat Kohli, Sunil Narine etc. This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html.

The package can be installed directly from CRAN

if (!require("cricketr")){ 
    install.packages("cricketr",lib = "c:/test") 
} 
library(cricketr)

or from Github

library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)

The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virat Kohli, Sunil Narine etc. This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html. Hence, Kohlis profile is 253802. This can be used to get the data for Virat Kohli as shown below

kohli <- getPlayerDataTT(253802,dir="..",file="kohli.csv",type="batting")

The analysis is included below

Analyses of Batsmen

The following plots gives the analysis of the 4 ODI batsmen

  1. Virat Kohli (Ind) – Innings-26, Runs-972, Average-46.28,Strike Rate-131.70
  2. Faf du Plessis (SA) – Innings-24, Runs-805, Average-42.36,Strike Rate-135.75
  3. A J Finch (Aus) – Innings-22, Runs-756, Average-39.78,Strike Rate-152.41
  4. Brendon McCullum (NZ) – Innings-70, Runs-2140, Average-35.66,Strike Rate-136.21

Plot of 4s, 6s and the scoring rate in ODIs

The 3 charts below give the number of

  1. 4s vs Runs scored
  2. 6s vs Runs scored
  3. Balls faced vs Runs scored A regression line is fitted in each of these plots for each of the ODI batsmen

A. Virat Kohli
– The 1st plot shows that Kohli approximately hits about 5 4’s on his way to the 50s
– The 2nd box plot of no of 6s and runs shows the range of runs when Kohli scored 1,2 or 4 6s. The dark line in the box shows the average runs when he scored those number of 6s. So when he scored 1 6 the average runs he scored was 45
– The 3rd plot shows the number of runs scored against the balls faced. It can be seen when Kohli faced 50 balls he had scored around ~ 70 runs

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./kohli.csv","Kohli")
batsman6s("./kohli.csv","Kohli")
batsmanScoringRateODTT("./kohli.csv","Kohli")

kohli-4s6sSR-1

dev.off()
## null device 
##           1

B. Faf du Plessis

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./plessis.csv","Du Plessis")
batsman6s("./plessis.csv","Du Plessis")
batsmanScoringRateODTT("./plessis.csv","Du Plessss")

plessis-4s6SR-1

dev.off()
## null device 
##           1

C. A J Finch

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./finch.csv","A J Finch")
batsman6s("./finch.csv","A J Finch")
batsmanScoringRateODTT("./finch.csv","A J Finch")

finch-4s6sSR-1

dev.off()
## null device 
##           1

D. Brendon McCullum

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./mccullum.csv","McCullum")
batsman6s("./mccullum.csv","McCullum")
batsmanScoringRateODTT("./mccullum.csv","McCullum")

mccullum-4s6sout-1

dev.off()
## null device 
##           1

Relative Mean Strike Rate

This plot shows the Mean Strike Rate of the batsman in each run range. It can be seen the A J Finch has the best strike rate followed by B McCullum.

par(mar=c(4,4,2,2))
frames <- list("./kohli.csv","./plessis.csv","finch.csv","mccullum.csv")
names <- list("Kohli","Du Plessis","Finch","McCullum")
relativeBatsmanSRODTT(frames,names)

plot-1-1

Relative Runs Frequency Percentage

The plot below provides the average runs scored in each run range 0-5,5-10,10-15 etc. Clearly Kohli has the most runs scored in most of the runs ranges. . This is also evident in the fact that Kohli has the highest average. He is followed by McCullum

frames <- list("./kohli.csv","./plessis.csv","finch.csv","mccullum.csv")
names <- list("Kohli","Du Plessis","Finch","McCullum")
relativeRunsFreqPerfODTT(frames,names)

plot-2-1

Percent 4’s,6’s in total runs scored

The plot below shows the percentage of runs scored by way of 4s and 6s for each batsman. Du Plessis has the highest percentage of 4s, McCullum has the highest 6s. Finch has the highest percentage of 4s & 6s – 25.37 + 15.64= 41.01%

rames <- list("./kohli.csv","./plessis.csv","finch.csv","mccullum.csv")
names <- list("Kohli","Du Plessis","Finch","McCullum")
runs4s6s <-batsman4s6s(frames,names)

plot-46s-1

print(runs4s6s)
##                Kohli Du Plessis Finch McCullum
## Runs(1s,2s,3s) 64.29      64.55 58.99    61.45
## 4s             27.78      24.38 25.37    22.87
## 6s              7.94      11.07 15.64    15.69

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is then fitted based on the Balls Faced and Minutes at Crease to give the runs scored

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./kohli.csv","Kohli")
battingPerf3d("./plessis.csv","Du Plessis")

plot-3-1

dev.off()
## null device 
##           1
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./finch.csv","A J Finch")
battingPerf3d("./mccullum.csv","McCullum")

plot-4-1

dev.off()
## null device 
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A hypothetical Balls faced and Minutes at Crease is used to predict the runs scored by each batsman based on the computed prediction plane

BF <- seq( 5, 70,length=10)
Mins <- seq(5,70,length=10)
newDF <- data.frame(BF,Mins)

kohli <- batsmanRunsPredict("./kohli.csv","Kohli",newdataframe=newDF)
plessis <- batsmanRunsPredict("./plessis.csv","Du Plessis",newdataframe=newDF)
finch <- batsmanRunsPredict("./finch.csv","A J Finch",newdataframe=newDF)
mccullum <- batsmanRunsPredict("./mccullum.csv","McCullum",newdataframe=newDF)

The predicted runs is displayed. As can be seen Finch has the best overall strike rate followed by McCullum.

batsmen <-cbind(round(kohli$Runs),round(plessis$Runs),round(finch$Runs),round(mccullum$Runs))
colnames(batsmen) <- c("Kohli","Du Plessis","Finch","McCullum")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Kohli Du Plessis Finch McCullum
## 1           5            5     2          1     5        3
## 2          12           12    12         10    22       16
## 3          19           19    22         19    40       28
## 4          27           27    31         28    57       41
## 5          34           34    41         37    74       54
## 6          41           41    51         47    91       66
## 7          48           48    60         56   108       79
## 8          56           56    70         65   125       91
## 9          63           63    79         74   142      104
## 10         70           70    89         84   159      117

Highest runs likelihood

The plots below the runs likelihood of batsman. This uses K-Means Kohli has the highest likelihood of scoring runs 34.2% likely to score 66 runs. Du Plessis has 25% likelihood to score 53 runs, A. Virat Kohli

batsmanRunsLikelihood("./kohli.csv","Kohli")

kohli-lh-1

## Summary of  Kohli 's runs scoring likelihood
## **************************************************
## 
## There is a 23.08 % likelihood that Kohli  will make  10 Runs in  10 balls over 13  Minutes 
## There is a 42.31 % likelihood that Kohli  will make  29 Runs in  23 balls over  30  Minutes 
## There is a 34.62 % likelihood that Kohli  will make  66 Runs in  47 balls over 63  Minutes

B. Faf Du Plessis

batsmanRunsLikelihood("./plessis.csv","Du Plessis")

plessis-l-1

## Summary of  Du Plessis 's runs scoring likelihood
## **************************************************
## 
## There is a 62.5 % likelihood that Du Plessis  will make  14 Runs in  11 balls over 19  Minutes 
## There is a 25 % likelihood that Du Plessis  will make  53 Runs in  40 balls over  50  Minutes 
## There is a 12.5 % likelihood that Du Plessis  will make  94 Runs in  61 balls over 90  Minutes

C. A J Finch

batsmanRunsLikelihood("./finch.csv","A J Finch")

finch-lh,cache-TRUE-1

## Summary of  A J Finch 's runs scoring likelihood
## **************************************************
## 
## There is a 20 % likelihood that A J Finch  will make  95 Runs in  54 balls over 70  Minutes 
## There is a 25 % likelihood that A J Finch  will make  42 Runs in  27 balls over  35  Minutes 
## There is a 55 % likelihood that A J Finch  will make  8 Runs in  8 balls over 12  Minutes

D. Brendon McCullum

batsmanRunsLikelihood("./mccullum.csv","McCullum")

mccullum-1

## Summary of  McCullum 's runs scoring likelihood
## **************************************************
## 
## There is a 50.72 % likelihood that McCullum  will make  11 Runs in  10 balls over 13  Minutes 
## There is a 28.99 % likelihood that McCullum  will make  36 Runs in  27 balls over  37  Minutes 
## There is a 20.29 % likelihood that McCullum  will make  74 Runs in  48 balls over 70  Minutes

Moving Average of runs over career

The moving average for the 4 batsmen indicate the following. It must be noted that there is not sufficient data yet on Twenty20 Internationals. Kpohli, Du Plessis and Finch average only 26 innings while McCullum has close to 70. So the moving average while an indication will regress towards the mean over time.

  1. The moving average of Kohli and Du Plessis is on the way up.
  2. McCullum has a consistent performance while Finch had a brief burst in 2013-2014
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./kohli.csv","Kohli")
batsmanMovingAverage("./plessis.csv","Du Plessis")
batsmanMovingAverage("./finch.csv","A J Finch")
batsmanMovingAverage("./mccullum.csv","McCullum")

sdgm-ma-1

dev.off()
## null device 
##           1

Analysis of bowlers

  1. Samuel Badree (WI) – Innings-22, Runs -464, Wickets – 31, Econ Rate : 5.39
  2. Sunil Narine (WI)- Innings-31,Runs-666, Wickets – 38 , Econ Rate : 5.70
  3. Ravichander Ashwin (Ind)- Innings-26, Runs- 732, Wickets – 25, Econ Rate : 7.32
  4. Ajantha Mendis (SL)- Innings-39, Runs – 952,Wickets – 66, Econ Rate : 6.45

The plot shows the frequency with which the bowlers have taken 1,2,3 etc wickets. The most wickets taken is by Ajantha Mendis (6 wickets)

Wicket Frequency percentage

This plot gives the percentage of wickets for each wickets (1,2,3…etc)

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./badree.csv","Badree")
bowlerWktsFreqPercent("./mendis.csv","Mendis")
bowlerWktsFreqPercent("./narine.csv","Narine")
bowlerWktsFreqPercent("./ashwin.csv","Ashwin")

relBowlFP-1

dev.off()
## null device 
##           1

Wickets Runs plot

The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers. The ends of the box indicate the 25% and 75% percentile of runs scored for the wickets taken and the dark balck line is the average runs conceded.

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsRunsPlot("./badree.csv","Badree")
bowlerWktsRunsPlot("./mendis.csv","Mendis")
bowlerWktsRunsPlot("./narine.csv","Narine")
bowlerWktsRunsPlot("./ashwin.csv","Ashwin")

wktsrun-1

dev.off()
## null device 
##           1

This plot below shows the average number of deliveries needed by the bowler to take the wickets (1,2,3 etc)

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktRateTT("./badree.csv","Badree")
bowlerWktRateTT("./mendis.csv","Mendis")

wktsrate1-1

dev.off()
## null device 
##           1
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktRateTT("./narine.csv","Narine")
bowlerWktRateTT("./ashwin.csv","Ashwin")

wktsrate2-1

dev.off()
## null device 
##           1

Relative bowling performance

The plot below shows that Narine has the most wickets in the 2 -4 range followed by Mendis

frames <- list("./badree.csv","./mendis.csv","narine.csv","ashwin.csv")
names <- list("Badree","Mendis","Narine","Ashwin")
relativeBowlingPerf(frames,names)

relBowlPerf-1

Relative Economy Rate against wickets taken

The economy rate can be deduced as follows from the plot below. Narine has a good economy rate around 1 & 4 wickets, Ashwin around 2 wickets and Badree around 3. wickets

frames <- list("./badree.csv","./mendis.csv","narine.csv","ashwin.csv")
names <- list("Badree","Mendis","Narine","Ashwin")
relativeBowlingERODTT(frames,names)

relBowlER-1

Relative Wicket Rate

The relative wicket rate plots the mean number of deliveries needed to take the wickets namely (1,2,3,4). For e.g. Narine needed an average of 22 deliveries to take 1 wicket and 22.5,23.2, 24 deliveries to take 2,3 & 4 wickets respectively

frames <- list("./badree.csv","./mendis.csv","narine.csv","ashwin.csv")
names <- list("Badree","Mendis","Narine","Ashwin")
relativeWktRateTT(frames,names)

relBowlWktRate-1

Moving average of wickets over career

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./badree.csv","Badree")
bowlerMovingAverage("./mendis.csv","Mendis")
bowlerMovingAverage("./narine.csv","Narine")
bowlerMovingAverage("./ashwin.csv","Ashwin")
## null device 
##           1

jsba-bowlma-1

Key findings

Here are some key conclusions

Twenty 20 batsmen

  1. Kohli has the a very consistent performance scoring high runs in the different run ranges. Kohli also has a 34.2% likelihood to score 6 runs. He is followed by McCullum for consisten performance
  2. Finch has a best strike rate followed by McCullum.
  3. Du Plessis has the highest percentage of 4s and McCullum has the percentage of 6s. Finch is superior in the percentage of runs scored in 4s and 6s
  4. For a hypothetical balls faced and minutes at crease, Finch does best followed by McCullum
  5. Kohli’s & Du Plessis Twenty20 career is on a upswing. Can they maintain the momentum. McCullum is consistent

Twenty20 bowlers

  1. Narine has the highest wickets percentage for different wickets taken followed by Mendis
  2. Mendis has taken 1,2,3,4,6 wickets in 24 deliveries
  3. Narine has the lowest economy rate for 1 & 4 wickets, Ashwin for 2 wickets and Badree for 3 wickets. Mendis is comparatively expensive
  4. Narine needed the least deliveries to get 1 (22.5) & 2 (23.2) wickets, Mendis needed 20.5 deliveries and Ashwin 19 deliveries for 4 wickets

Key takeaways 1. If all the above batsment and bowlers were in the same team we expect

  1. Finch would be most useful when the run rate has to be greatly accelerated followed by McCullum
  2. If the need is to consolidate, then Kohli is the best man for the job followed by McCullum
  3. Overall McCullum is the best bet for Twenty20
  4. When it comes to bowling Narine wins hands down as he has the most wickets, a good economy rate and a very good attack rate. So Narine is great bet for providing a vital breakthrough.

Also see my other posts in R

  1. Introducing cricketr! : An R package to analyze performances of cricketers
  2. cricketr plays the ODIs!
  3. A peek into literacy in India: Statistical Learning with R
  4. A crime map of India in R – Crimes against women
  5. Analyzing cricket’s batting legends – Through the mirage with R
  6. Mirror, mirror . the best batsman of them all?

You may also like

  1. A closer look at “Robot Horse on a Trot” in Android
  2. What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
  3. Bend it like Bluemix, MongoDB with autoscaling – Part 2
  4. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
  5. TWS-4: Gossip protocol: Epidemics and rumors to the rescue
  6. Deblurring with OpenCV:Weiner filter reloaded
  7. Architecting a cloud based IP Multimedia System (IMS)

cricketr plays the ODIs!


Published in R bloggers: cricketr plays the ODIs

Introduction

In this post my package ‘cricketr’ takes a swing at One Day Internationals(ODIs). Like test batsman who adapt to ODIs with some innovative strokes, the cricketr package has some additional functions and some modified functions to handle the high strike and economy rates in ODIs. As before I have chosen my top 4 ODI batsmen and top 4 ODI bowlers.

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

You can also read this post at Rpubs as odi-cricketr. Dowload this report as a PDF file from odi-cricketr.pdf

Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today!

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. Just a familiarity with R and R Markdown only is needed.

Batsmen

  1. Virendar Sehwag (Ind)
  2. AB Devilliers (SA)
  3. Chris Gayle (WI)
  4. Glenn Maxwell (Aus)

Bowlers

  1. Mitchell Johnson (Aus)
  2. Lasith Malinga (SL)
  3. Dale Steyn (SA)
  4. Tim Southee (NZ)

I have sprinkled the plots with a few of my comments. Feel free to draw your conclusions! The analysis is included below

The profile for Virender Sehwag is 35263. This can be used to get the ODI data for Sehwag. For a batsman the type should be “batting” and for a bowler the type should be “bowling” and the function is getPlayerDataOD()

The package can be installed directly from CRAN

if (!require("cricketr")){ 
    install.packages("cricketr",lib = "c:/test") 
} 
library(cricketr)

or from Github

library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)

The One day data for a particular player can be obtained with the getPlayerDataOD() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virendar Sehwag, etc. This will bring up a page which have the profile number for the player e.g. for Virendar Sehwag this would be http://www.espncricinfo.com/india/content/player/35263.html. Hence, Sehwag’s profile is 35263. This can be used to get the data for Virat Sehwag as shown below

sehwag <- getPlayerDataOD(35263,dir="..",file="sehwag.csv",type="batting")

Analyses of Batsmen

The following plots gives the analysis of the 4 ODI batsmen

  1. Virendar Sehwag (Ind) – Innings – 245, Runs = 8586, Average=35.05, Strike Rate= 104.33
  2. AB Devilliers (SA) – Innings – 179, Runs= 7941, Average=53.65, Strike Rate= 99.12
  3. Chris Gayle (WI) – Innings – 264, Runs= 9221, Average=37.65, Strike Rate= 85.11
  4. Glenn Maxwell (Aus) – Innings – 45, Runs= 1367, Average=35.02, Strike Rate= 126.69

Plot of 4s, 6s and the scoring rate in ODIs

The 3 charts below give the number of

  1. 4s vs Runs scored
  2. 6s vs Runs scored
  3. Balls faced vs Runs scored

A regression line is fitted in each of these plots for each of the ODI batsmen A. Virender Sehwag

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./sehwag.csv","Sehwag")
batsman6s("./sehwag.csv","Sehwag")
batsmanScoringRateODTT("./sehwag.csv","Sehwag")

sehwag-4s6sSR-1

dev.off()
## null device 
##           1

B. AB Devilliers

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./devilliers.csv","Devillier")
batsman6s("./devilliers.csv","Devillier")
batsmanScoringRateODTT("./devilliers.csv","Devillier")

devillier-4s6SR-1

dev.off()
## null device 
##           1

C. Chris Gayle

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./gayle.csv","Gayle")
batsman6s("./gayle.csv","Gayle")
batsmanScoringRateODTT("./gayle.csv","Gayle")

gayle-4s6sSR-1

dev.off()
## null device 
##           1

D. Glenn Maxwell

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./maxwell.csv","Maxwell")
batsman6s("./maxwell.csv","Maxwell")
batsmanScoringRateODTT("./maxwell.csv","Maxwell")

maxwell-4s6sout-1

dev.off()
## null device 
##           1

Relative Mean Strike Rate

In this first plot I plot the Mean Strike Rate of the batsmen. It can be seen that Maxwell has a awesome strike rate in ODIs. However we need to keep in mind that Maxwell has relatively much fewer (only 45 innings) innings. He is followed by Sehwag who(most innings- 245) also has an excellent strike rate till 100 runs and then we have Devilliers who roars ahead. This is also seen in the overall strike rate in above

par(mar=c(4,4,2,2))
frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
relativeBatsmanSRODTT(frames,names)

plot-1-1

Relative Runs Frequency Percentage

Sehwag leads in the percentage of runs in 10 run ranges upto 50 runs. Maxwell and Devilliers lead in 55-66 & 66-85 respectively.

frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
relativeRunsFreqPerfODTT(frames,names)

plot-2-1

Percentage of 4s,6s in the runs scored

The plot below shows the percentage of runs made by the batsmen by ways of 1s,2s,3s, 4s and 6s. It can be seen that Sehwag has the higheest percent of 4s (33.36%) in his overall runs in ODIs. Maxwell has the highest percentage of 6s (13.36%) in his ODI career. If we take the overall 4s+6s then Sehwag leads with (33.36 +5.95 = 39.31%),followed by Gayle (27.80+10.15=37.95%)

Percent 4’s,6’s in total runs scored

The plot below shows the contrib

frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
runs4s6s <-batsman4s6s(frames,names)

plot-46s-1

print(runs4s6s)
##                Sehwag Devilliers Gayle Maxwell
## Runs(1s,2s,3s)  60.69      67.39 62.05   62.11
## 4s              33.36      24.28 27.80   24.53
## 6s               5.95       8.32 10.15   13.36
 

Runs forecast

The forecast for the batsman is shown below.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./sehwag.csv","Sehwag")
batsmanPerfForecast("./devilliers.csv","Devilliers")
batsmanPerfForecast("./gayle.csv","Gayle")
batsmanPerfForecast("./maxwell.csv","Maxwell")

swcr-perf-1

dev.off()
## null device 
##           1

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./sehwag.csv","V Sehwag")
battingPerf3d("./devilliers.csv","AB Devilliers")

plot-3-1

dev.off()
## null device 
##           1
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./gayle.csv","C Gayle")
battingPerf3d("./maxwell.csv","G Maxwell")

plot-4-1

dev.off()
## null device 
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

BF <- seq( 10, 200,length=10)
Mins <- seq(30,220,length=10)
newDF <- data.frame(BF,Mins)

sehwag <- batsmanRunsPredict("./sehwag.csv","Sehwag",newdataframe=newDF)
devilliers <- batsmanRunsPredict("./devilliers.csv","Devilliers",newdataframe=newDF)
gayle <- batsmanRunsPredict("./gayle.csv","Gayle",newdataframe=newDF)
maxwell <- batsmanRunsPredict("./maxwell.csv","Maxwell",newdataframe=newDF)

The fitted model is then used to predict the runs that the batsmen will score for a hypotheticial Balls faced and Minutes at crease. It can be seen that Maxwell sets a searing pace in the predicted runs for a given Balls Faced and Minutes at crease followed by Sehwag. But we have to keep in mind that Maxwell has only around 1/5th of the innings of Sehwag (45 to Sehwag’s 245 innings). They are followed by Devilliers and then finally Gayle

batsmen <-cbind(round(sehwag$Runs),round(devilliers$Runs),round(gayle$Runs),round(maxwell$Runs))
colnames(batsmen) <- c("Sehwag","Devilliers","Gayle","Maxwell")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Sehwag Devilliers Gayle Maxwell
## 1          10           30     11         12    11      18
## 2          31           51     33         32    28      43
## 3          52           72     55         52    46      67
## 4          73           93     77         71    63      92
## 5          94          114    100         91    81     117
## 6         116          136    122        111    98     141
## 7         137          157    144        130   116     166
## 8         158          178    167        150   133     191
## 9         179          199    189        170   151     215
## 10        200          220    211        190   168     240

Highest runs likelihood

The plots below the runs likelihood of batsman. This uses K-Means It can be seen that Devilliers has almost 27.75% likelihood to make around 90+ runs. Gayle and Sehwag have 34% to make 40+ runs. A. Virender Sehwag

A. Virender Sehwag

batsmanRunsLikelihood("./sehwag.csv","Sehwag")

smith-1

## Summary of  Sehwag 's runs scoring likelihood
## **************************************************
## 
## There is a 35.22 % likelihood that Sehwag  will make  46 Runs in  44 balls over 67  Minutes 
## There is a 9.43 % likelihood that Sehwag  will make  119 Runs in  106 balls over  158  Minutes 
## There is a 55.35 % likelihood that Sehwag  will make  12 Runs in  13 balls over 18  Minutes

B. AB Devilliers

batsmanRunsLikelihood("./devilliers.csv","Devilliers")

warner-1

## Summary of  Devilliers 's runs scoring likelihood
## **************************************************
## 
## There is a 30.65 % likelihood that Devilliers  will make  44 Runs in  43 balls over 60  Minutes 
## There is a 29.84 % likelihood that Devilliers  will make  91 Runs in  88 balls over  124  Minutes 
## There is a 39.52 % likelihood that Devilliers  will make  11 Runs in  15 balls over 21  Minutes

C. Chris Gayle

batsmanRunsLikelihood("./gayle.csv","Gayle")

cook,cache-TRUE-1

## Summary of  Gayle 's runs scoring likelihood
## **************************************************
## 
## There is a 32.69 % likelihood that Gayle  will make  47 Runs in  51 balls over 72  Minutes 
## There is a 54.49 % likelihood that Gayle  will make  10 Runs in  15 balls over  20  Minutes 
## There is a 12.82 % likelihood that Gayle  will make  109 Runs in  119 balls over 172  Minutes

D. Glenn Maxwell

batsmanRunsLikelihood("./maxwell.csv","Maxwell")

oot-1

## Summary of  Maxwell 's runs scoring likelihood
## **************************************************
## 
## There is a 34.38 % likelihood that Maxwell  will make  39 Runs in  29 balls over 35  Minutes 
## There is a 15.62 % likelihood that Maxwell  will make  89 Runs in  55 balls over  69  Minutes 
## There is a 50 % likelihood that Maxwell  will make  6 Runs in  7 balls over 9  Minutes

Average runs at ground and against opposition

A. Virender Sehwag

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./sehwag.csv","Sehwag")
batsmanAvgRunsOpposition("./sehwag.csv","Sehwag")

avgrg-1-1

dev.off()
## null device 
##           1

B. AB Devilliers

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./devilliers.csv","Devilliers")
batsmanAvgRunsOpposition("./devilliers.csv","Devilliers")

avgrg-2-1

dev.off()
## null device 
##           1

C. Chris Gayle

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./gayle.csv","Gayle")
batsmanAvgRunsOpposition("./gayle.csv","Gayle")

avgrg-3-1

dev.off()
## null device 
##           1

D. Glenn Maxwell

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./maxwell.csv","Maxwell")
batsmanAvgRunsOpposition("./maxwell.csv","Maxwell")

avgrg-4-1

dev.off()
## null device 
##           1

Moving Average of runs over career

The moving average for the 4 batsmen indicate the following

1. The moving average of Devilliers and Maxwell is on the way up.
2. Sehwag shows a slight downward trend from his 2nd peak in 2011
3. Gayle maintains a consistent 45 runs for the last few years

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./sehwag.csv","Sehwag")
batsmanMovingAverage("./devilliers.csv","Devilliers")
batsmanMovingAverage("./gayle.csv","Gayle")
batsmanMovingAverage("./maxwell.csv","Maxwell")

sdgm-ma-1

dev.off()
## null device 
##           1

Check batsmen in-form, out-of-form

  1. Maxwell, Devilliers, Sehwag are in-form. This is also evident from the moving average plot
  2. Gayle is out-of-form
checkBatsmanInForm("./sehwag.csv","Sehwag")
## *******************************************************************************************
## 
## Population size: 143  Mean of population: 33.76 
## Sample size: 16  Mean of sample: 37.44 SD of sample: 55.15 
## 
## Null hypothesis H0 : Sehwag 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Sehwag 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Sehwag 's Form Status: In-Form because the p value: 0.603525  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./devilliers.csv","Devilliers")
## *******************************************************************************************
## 
## Population size: 111  Mean of population: 43.5 
## Sample size: 13  Mean of sample: 57.62 SD of sample: 40.69 
## 
## Null hypothesis H0 : Devilliers 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Devilliers 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Devilliers 's Form Status: In-Form because the p value: 0.883541  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./gayle.csv","Gayle")
## *******************************************************************************************
## 
## Population size: 140  Mean of population: 37.1 
## Sample size: 16  Mean of sample: 17.25 SD of sample: 20.25 
## 
## Null hypothesis H0 : Gayle 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Gayle 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Gayle 's Form Status: Out-of-Form because the p value: 0.000609  is less than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./maxwell.csv","Maxwell")
## *******************************************************************************************
## 
## Population size: 28  Mean of population: 25.25 
## Sample size: 4  Mean of sample: 64.25 SD of sample: 36.97 
## 
## Null hypothesis H0 : Maxwell 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Maxwell 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Maxwell 's Form Status: In-Form because the p value: 0.948744  is greater than alpha=  0.05"
## *******************************************************************************************

Analysis of bowlers

  1. Mitchell Johnson (Aus) – Innings-150, Wickets – 239, Econ Rate : 4.83
  2. Lasith Malinga (SL)- Innings-182, Wickets – 287, Econ Rate : 5.26
  3. Dale Steyn (SA)- Innings-103, Wickets – 162, Econ Rate : 4.81
  4. Tim Southee (NZ)- Innings-96, Wickets – 135, Econ Rate : 5.33

Malinga has the highest number of innings and wickets followed closely by Mitchell. Steyn and Southee have relatively fewer innings.

To get the bowler’s data use

malinga <- getPlayerDataOD(49758,dir=".",file="malinga.csv",type="bowling")

Wicket Frequency percentage

This plot gives the percentage of wickets for each wickets (1,2,3…etc)

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./mitchell.csv","J Mitchell")
bowlerWktsFreqPercent("./malinga.csv","Malinga")
bowlerWktsFreqPercent("./steyn.csv","Steyn")
bowlerWktsFreqPercent("./southee.csv","southee")

relBowlFP-1

dev.off()
## null device 
##           1

Wickets Runs plot

The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers. M Johnson and Steyn are more economical than Malinga and Southee corroborating the figures above

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))

bowlerWktsRunsPlot("./mitchell.csv","J Mitchell")
bowlerWktsRunsPlot("./malinga.csv","Malinga")
bowlerWktsRunsPlot("./steyn.csv","Steyn")
bowlerWktsRunsPlot("./southee.csv","southee")

wktsrun-1

dev.off()
## null device 
##           1

Average wickets in different grounds and opposition

A. Mitchell Johnson

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./mitchell.csv","J Mitchell")
bowlerAvgWktsOpposition("./mitchell.csv","J Mitchell")

gr-1-1

dev.off()
## null device 
##           1

B. Lasith Malinga

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./malinga.csv","Malinga")
bowlerAvgWktsOpposition("./malinga.csv","Malinga")

gr-2-1

dev.off()
## null device 
##           1

C. Dale Steyn

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./steyn.csv","Steyn")
bowlerAvgWktsOpposition("./steyn.csv","Steyn")

gr-3-1

dev.off()
## null device 
##           1

D. Tim Southee

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./southee.csv","southee")
bowlerAvgWktsOpposition("./southee.csv","southee")

avgrg-4-1

dev.off()
## null device 
##           1

Relative bowling performance

The plot below shows that Mitchell Johnson and Southee have more wickets in 3-4 wickets range while Steyn and Malinga in 1-2 wicket range

frames <- list("./mitchell.csv","./malinga.csv","steyn.csv","southee.csv")
names <- list("M Johnson","Malinga","Steyn","Southee")
relativeBowlingPerf(frames,names)

relBowlPerf-1

Relative Economy Rate against wickets taken

Steyn had the best economy rate followed by M Johnson. Malinga and Southee have a poorer economy rate

frames <- list("./mitchell.csv","./malinga.csv","steyn.csv","southee.csv")
names <- list("M Johnson","Malinga","Steyn","Southee")
relativeBowlingERODTT(frames,names)

relBowlER-1

Moving average of wickets over career

Johnson and Steyn career vs wicket graph is on the up-swing. Southee is maintaining a reasonable record while Malinga shows a decline in ODI performance

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./mitchell.csv","M Johnson")
bowlerMovingAverage("./malinga.csv","Malinga")
bowlerMovingAverage("./steyn.csv","Steyn")
bowlerMovingAverage("./southee.csv","Southee")

jmss-bowlma-1

dev.off()
## null device 
##           1

Wickets forecast

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerPerfForecast("./mitchell.csv","M Johnson")
bowlerPerfForecast("./malinga.csv","Malinga")
bowlerPerfForecast("./steyn.csv","Steyn")
bowlerPerfForecast("./southee.csv","southee")

jsba-pfcst-1

dev.off()
## null device 
##           1

Check bowler in-form, out-of-form

All the bowlers are shown to be still in-form

checkBowlerInForm("./mitchell.csv","J Mitchell")
## *******************************************************************************************
## 
## Population size: 135  Mean of population: 1.55 
## Sample size: 15  Mean of sample: 2 SD of sample: 1.07 
## 
## Null hypothesis H0 : J Mitchell 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : J Mitchell 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "J Mitchell 's Form Status: In-Form because the p value: 0.937917  is greater than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./malinga.csv","Malinga")
## *******************************************************************************************
## 
## Population size: 163  Mean of population: 1.58 
## Sample size: 19  Mean of sample: 1.58 SD of sample: 1.22 
## 
## Null hypothesis H0 : Malinga 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Malinga 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Malinga 's Form Status: In-Form because the p value: 0.5  is greater than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./steyn.csv","Steyn")
## *******************************************************************************************
## 
## Population size: 93  Mean of population: 1.59 
## Sample size: 11  Mean of sample: 1.45 SD of sample: 0.69 
## 
## Null hypothesis H0 : Steyn 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Steyn 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Steyn 's Form Status: In-Form because the p value: 0.257438  is greater than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./southee.csv","southee")
## *******************************************************************************************
## 
## Population size: 86  Mean of population: 1.48 
## Sample size: 10  Mean of sample: 0.8 SD of sample: 1.14 
## 
## Null hypothesis H0 : southee 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : southee 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "southee 's Form Status: Out-of-Form because the p value: 0.044302  is less than alpha=  0.05"
## *******************************************************************************************

***************

Key findings

Here are some key conclusions ODI batsmen

  1. AB Devilliers has high frequency of runs in the 60-120 range and the highest average
  2. Sehwag has the most number of innings and good strike rate
  3. Maxwell has the best strike rate but it should be kept in mind that he has 1/5 of the innings of Sehwag. We need to see how he progress further
  4. Sehwag has the highest percentage of 4s in the runs scored, while Maxwell has the most 6s
  5. For a hypothetical Balls Faced and Minutes at creases Maxwell will score the most runs followed by Sehwag
  6. The moving average of indicates that the best is yet to come for Devilliers and Maxwell. Sehwag has a few more years in him while Gayle shows a decline in ODI performance and an out of form is indicated.

ODI bowlers

  1. Malinga has the highest played the highest innings and also has the highest wickets though he has poor economy rate
  2. M Johnson is the most effective in the 3-4 wicket range followed by Southee
  3. M Johnson and Steyn has the best overall economy rate followed by Malinga and Steyn 4 M Johnson and Steyn’s career is on the up-swing,Southee maintains a steady consistent performance, while Malinga shows a downward trend

Hasta la vista! I’ll be back!
Watch this space!

Also see my other posts in R

  1. Introducing cricketr! : An R package to analyze performances of cricketers
  2. cricketr digs the Ashes!
  3. A peek into literacy in India: Statistical Learning with R
  4. A crime map of India in R – Crimes against women
  5. Analyzing cricket’s batting legends – Through the mirage with R
  6. Mirror, mirror . the best batsman of them all?

You may also like

  1. A closer look at “Robot Horse on a Trot” in Android
  2. What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
  3. Bend it like Bluemix, MongoDB with autoscaling – Part 2
  4. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
  5. TWS-4: Gossip protocol: Epidemics and rumors to the rescue
  6. Deblurring with OpenCV:Weiner filter reloadedhttp://www.r-bloggers.com/cricketr-plays-the-odis/

cricketr digs the Ashes!


Published in R bloggers: cricketr digs the Ashes

Introduction

In some circles the Ashes is considered the ‘mother of all cricketing battles’. But, being a staunch supporter of all things Indian, cricket or otherwise, I have to say that the Ashes pales in comparison against a India-Pakistan match. After all, what are a few frowns and raised eyebrows at the Ashes in comparison to the seething emotions and reckless exuberance of Indian fans.

Anyway, the Ashes are an interesting duel and I have decided to do some cricketing analysis using my R package cricketr. For this analysis I have chosen the top 2 batsman and top 2 bowlers from both the Australian and English sides.

Batsmen

  1. Steven Smith (Aus) – Innings – 58 , Ave: 58.52, Strike Rate: 55.90
  2. David Warner (Aus) – Innings – 76, Ave: 46.86, Strike Rate: 73.88
  3. Alistair Cook (Eng) – Innings – 208 , Ave: 46.62, Strike Rate: 46.33
  4. J E Root (Eng) – Innings – 53, Ave: 54.02, Strike Rate: 51.30

Bowlers

  1. Mitchell Johnson (Aus) – Innings-131, Wickets – 299, Econ Rate : 3.28
  2. Peter Siddle (Aus) – Innings – 104 , Wickets- 192, Econ Rate : 2.95
  3. James Anderson (Eng) – Innings – 199 , Wickets- 406, Econ Rate : 3.05
  4. Stuart Broad (Eng) – Innings – 148 , Wickets- 296, Econ Rate : 3.08

It is my opinion if any 2 of the 4 in either team click then they will be able to swing the match in favor of their team.

I have interspersed the plots with a few comments. Feel free to draw your conclusions!

The analysis is included below. Note: This post has also been hosted at Rpubs as cricketr digs the Ashes!
You can also download this analysis as a PDF file from cricketr digs the Ashes!

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today!

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. Just a familiarity with R and R Markdown only is needed.

The package can be installed directly from CRAN

if (!require("cricketr")){ 
    install.packages("cricketr",lib = "c:/test") 
} 
library(cricketr)

or from Github

library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)

Analyses of Batsmen

The following plots gives the analysis of the 2 Australian and 2 English batsmen. It must be kept in mind that Cooks has more innings than all the rest put together. Smith has the best average, and Warner has the best strike rate

Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

batsmanPerfBoxHist("./smith.csv","S Smith")

swcr-boxhist-1

batsmanPerfBoxHist("./warner.csv","D Warner")

swcr-boxhist-2

batsmanPerfBoxHist("./cook.csv","A Cook")

swcr-boxhist-3

batsmanPerfBoxHist("./root.csv","JE Root")

swcr-boxhist-4

Plot os 4s, 6s and the type of dismissals

A. Steven Smith

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./smith.csv","S Smith")
batsman6s("./smith.csv","S Smith")
batsmanDismissals("./smith.csv","S Smith")

smith-4s6sout-1

dev.off()
## null device 
##           1

B. David Warner

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./warner.csv","D Warner")
batsman6s("./warner.csv","D Warner")
batsmanDismissals("./warner.csv","D Warner")

warner-4s6sout-1

dev.off()
## null device 
##           1

C. Alistair Cook

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./cook.csv","A Cook")
batsman6s("./cook.csv","A Cook")
batsmanDismissals("./cook.csv","A Cook")

cook-4s6sout-1

dev.off()
## null device 
##           1

D. J E Root

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./root.csv","JE Root")
batsman6s("./root.csv","JE Root")
batsmanDismissals("./root.csv","JE Root")

root-4s6sout-1

dev.off()
## null device 
##           1

Relative Mean Strike Rate

In this first plot I plot the Mean Strike Rate of the batsmen. It can be Warner’s has the best strike rate (hit outside the plot!) followed by Smith in the range 20-100. Root has a good strike rate above hundred runs. Cook maintains a good strike rate.

par(mar=c(4,4,2,2))
frames <- list("./smith.csv","./warner.csv","cook.csv","root.csv")
names <- list("Smith","Warner","Cook","Root")
relativeBatsmanSR(frames,names)

plot-1-1

Relative Runs Frequency Percentage

The plot below show the percentage contribution in each 10 runs bucket over the entire career.It can be seen that Smith pops up above the rest with remarkable regularity.COok is consistent over the entire range.

frames <- list("./smith.csv","./warner.csv","cook.csv","root.csv")
names <- list("Smith","Warner","Cook","Root")
relativeRunsFreqPerf(frames,names)

plot-2-1

Moving Average of runs over career

The moving average for the 4 batsmen indicate the following 1. S Smith is the most promising. There is a marked spike in Performance. Cook maintains a steady pace and is consistent over the years averaging 50 over the years.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./smith.csv","S Smith")
batsmanMovingAverage("./warner.csv","D Warner")
batsmanMovingAverage("./cook.csv","A Cook")
batsmanMovingAverage("./root.csv","JE Root")

swcr-ma-1

dev.off()
## null device 
##           1

Runs forecast

The forecast for the batsman is shown below. As before Cooks’s performance is really consistent across the years and the forecast is good for the years ahead. In Cook’s case it can be seen that the forecasted and actual runs are reasonably accurate

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./smith.csv","S Smith")
batsmanPerfForecast("./warner.csv","D Warner")
batsmanPerfForecast("./cook.csv","A Cook")
## Warning in HoltWinters(ts.train): optimization difficulties: ERROR:
## ABNORMAL_TERMINATION_IN_LNSRCH
batsmanPerfForecast("./root.csv","JE Root")

swcr-perf-1

dev.off()
## null device 
##           1

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./smith.csv","S Smith")
battingPerf3d("./warner.csv","D Warner")

plot-3-1

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./cook.csv","A Cook")
battingPerf3d("./root.csv","JE Root")

plot-4-1

dev.off()
## null device 
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
smith <- batsmanRunsPredict("./smith.csv","S Smith",newdataframe=newDF)
warner <- batsmanRunsPredict("./warner.csv","D Warner",newdataframe=newDF)
cook <- batsmanRunsPredict("./cook.csv","A Cook",newdataframe=newDF)
root <- batsmanRunsPredict("./root.csv","JE Root",newdataframe=newDF)

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease. It can be seen that Warner sets a searing pace in the predicted runs for a given Balls Faced and Minutes at crease while Smith and Root are neck to neck in the predicted runs

batsmen <-cbind(round(smith$Runs),round(warner$Runs),round(cook$Runs),round(root$Runs))
colnames(batsmen) <- c("Smith","Warner","Cook","Root")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Smith Warner Cook Root
## 1          10           30     9     12    6    9
## 2          38           71    25     33   20   25
## 3          66          111    42     53   33   42
## 4          94          152    58     73   47   59
## 5         121          193    75     93   60   75
## 6         149          234    91    114   74   92
## 7         177          274   108    134   88  109
## 8         205          315   124    154  101  125
## 9         233          356   141    174  115  142
## 10        261          396   158    195  128  159
## 11        289          437   174    215  142  175
## 12        316          478   191    235  155  192
## 13        344          519   207    255  169  208
## 14        372          559   224    276  182  225
## 15        400          600   240    296  196  242

Highest runs likelihood

The plots below the runs likelihood of batsman. This uses K-Means. It can be seen Smith has the best likelihood around 40% of scoring around 41 runs, followed by Root who has 28.3% likelihood of scoring around 81 runs

A. Steven Smith

batsmanRunsLikelihood("./smith.csv","S Smith")
smith-1
## Summary of  S Smith 's runs scoring likelihood
## **************************************************
## 
## There is a 40 % likelihood that S Smith  will make  41 Runs in  73 balls over 101  Minutes 
## There is a 36 % likelihood that S Smith  will make  9 Runs in  21 balls over  27  Minutes 
## There is a 24 % likelihood that S Smith  will make  139 Runs in  237 balls over 338  Minutes

B. David Warner

batsmanRunsLikelihood("./warner.csv","D Warner")
warner-1
## Summary of  D Warner 's runs scoring likelihood
## **************************************************
## 
## There is a 11.11 % likelihood that D Warner  will make  134 Runs in  159 balls over 263  Minutes 
## There is a 63.89 % likelihood that D Warner  will make  17 Runs in  25 balls over  37  Minutes 
## There is a 25 % likelihood that D Warner  will make  73 Runs in  105 balls over 156  Minutes

C. Alastair Cook

batsmanRunsLikelihood("./cook.csv","A Cook")
cook,cache-TRUE-1
## Summary of  A Cook 's runs scoring likelihood
## **************************************************
## 
## There is a 27.72 % likelihood that A Cook  will make  64 Runs in  140 balls over 195  Minutes 
## There is a 59.9 % likelihood that A Cook  will make  15 Runs in  32 balls over  46  Minutes 
## There is a 12.38 % likelihood that A Cook  will make  141 Runs in  300 balls over 420  Minutes

D. J E Root

batsmanRunsLikelihood("./root.csv","JE Root")
oot-1
## Summary of  JE Root 's runs scoring likelihood
## **************************************************
## 
## There is a 28.3 % likelihood that JE Root  will make  81 Runs in  158 balls over 223  Minutes 
## There is a 7.55 % likelihood that JE Root  will make  179 Runs in  290 balls over  425  Minutes 
## There is a 64.15 % likelihood that JE Root  will make  16 Runs in  39 balls over 59  Minutes
 

Average runs at ground and against opposition

A. Steven Smith

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./smith.csv","S Smith")
batsmanAvgRunsOpposition("./smith.csv","S Smith")

avgrg-1-1

dev.off()
## null device 
##           1

B. David Warner

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./warner.csv","D Warner")
batsmanAvgRunsOpposition("./warner.csv","D Warner")

avgrg-2-1

dev.off()
## null device 
##           1

C. Alistair Cook

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./cook.csv","A Cook")
batsmanAvgRunsOpposition("./cook.csv","A Cook")

avgrg-3-1

dev.off()
## null device 
##           1

D. J E Root

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./root.csv","JE Root")
batsmanAvgRunsOpposition("./root.csv","JE Root")

avgrg-4-1

dev.off()
## null device 
##           1

Analysis of bowlers

  1. Mitchell Johnson (Aus) – Innings-131, Wickets – 299, Econ Rate : 3.28
  2. Peter Siddle (Aus) – Innings – 104 , Wickets- 192, Econ Rate : 2.95
  3. James Anderson (Eng) – Innings – 199 , Wickets- 406, Econ Rate : 3.05
  4. Stuart Broad (Eng) – Innings – 148 , Wickets- 296, Econ Rate : 3.08

Anderson has the highest number of inning and wickets followed closely by Broad and Mitchell who are in a neck to neck race with respect to wickets. Johnson is on the more expensive side though. Siddle has fewer innings but a good economy rate.

Wicket Frequency percentage

This plot gives the percentage of wickets for each wickets (1,2,3…etc)

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./johnson.csv","Johnson")
bowlerWktsFreqPercent("./siddle.csv","Siddle")
bowlerWktsFreqPercent("./broad.csv","Broad")
bowlerWktsFreqPercent("./anderson.csv","Anderson")

relBowlFP-1

dev.off()
## null device 
##           1

Wickets Runs plot

The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsRunsPlot("./johnson.csv","Johnson")
bowlerWktsRunsPlot("./siddle.csv","Siddle")
bowlerWktsRunsPlot("./broad.csv","Broad")
bowlerWktsRunsPlot("./anderson.csv","Anderson")

wktsrun-1

dev.off()
## null device 
##           1

Average wickets in different grounds and opposition

A. Mitchell Johnson

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./johnson.csv","Johnson")
bowlerAvgWktsOpposition("./johnson.csv","Johnson")

gr-1-1

dev.off()
## null device 
##           1

B. Peter Siddle

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./siddle.csv","Siddle")
bowlerAvgWktsOpposition("./siddle.csv","Siddle")

gr-2-1

dev.off()
## null device 
##           1

C. Stuart Broad

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./broad.csv","Broad")
bowlerAvgWktsOpposition("./broad.csv","Broad")

gr-3-1

dev.off()
## null device 
##           1

D. James Anderson

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./anderson.csv","Anderson")
bowlerAvgWktsOpposition("./anderson.csv","Anderson")

gr-4-1

dev.off()
## null device 
##           1

Relative bowling performance

The plot below shows that Mitchell Johnson is the mopst effective bowler among the lot with a higher wickets in the 3-6 wicket range. Broad and Anderson seem to perform well in 2 wickets in comparison to Siddle but in 3 wickets Siddle is better than Broad and Anderson.

frames <- list("./johnson.csv","./siddle.csv","broad.csv","anderson.csv")
names <- list("Johnson","Siddle","Broad","Anderson")
relativeBowlingPerf(frames,names)

relBowlPerf-1

Relative Economy Rate against wickets taken

Anderson followed by Siddle has the best economy rates. Johnson is fairly expensive in the 4-8 wicket range.

frames <- list("./johnson.csv","./siddle.csv","broad.csv","anderson.csv")
names <- list("Johnson","Siddle","Broad","Anderson")
relativeBowlingER(frames,names)

relBowlER-1

Moving average of wickets over career

Johnson is on his second peak while Siddle is on the decline with respect to bowling. Broad and Anderson show improving performance over the years.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./johnson.csv","Johnson")
bowlerMovingAverage("./siddle.csv","Siddle")
bowlerMovingAverage("./broad.csv","Broad")
bowlerMovingAverage("./anderson.csv","Anderson")

jsba-bowlma-1

dev.off()
## null device 
##           1

Wickets forecast

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerPerfForecast("./johnson.csv","Johnson")
bowlerPerfForecast("./siddle.csv","Siddle")
bowlerPerfForecast("./broad.csv","Broad")
bowlerPerfForecast("./anderson.csv","Anderson")

jsba-bowlma-1

dev.off()
## null device 
##           1

Key findings

Here are some key conclusions

  1. Cook has the most number of innings and has been extremly consistent in his scores
  2. Warner has the best strike rate among the lot followed by Smith and Root
  3. The moving average shows a marked improvement over the years for Smith
  4. Johnson is the most effective bowler but is fairly expensive
  5. Anderson has the best economy rate followed by Siddle
  6. Johnson is at his second peak with respect to bowling while Broad and Anderson maintain a steady line and length in their career bowling performance


Also see my other posts in R

  1. Introducing cricketr! : An R package to analyze performances of cricketers
  2. Taking cricketr for a spin – Part 1
  3. A peek into literacy in India: Statistical Learning with R
  4. A crime map of India in R – Crimes against women
  5. Analyzing cricket’s batting legends – Through the mirage with R
  6. Masters of Spin: Unraveling the web with R
  7. Mirror, mirror . the best batsman of them all?

You may also like

  1. A crime map of India in R: Crimes against women
  2. What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
  3. Bend it like Bluemix, MongoDB with autoscaling – Part 2
  4. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
  5. Thinking Web Scale (TWS-3): Map-Reduce – Bring compute to data
  6. Deblurring with OpenCV:Weiner filter reloaded