*“Curiouser and curiouser!” cried Alice*

*“The time has come,” the walrus said, “to talk of many things: Of shoes and ships – and sealing wax – of cabbages and kings”*

*“Begin at the beginning,”the King said, very gravely,“and go on till you come to the end: then stop.”*

*“And what is the use of a book,” thought Alice, “without pictures or conversation?”*

` Excerpts from Alice in Wonderland by Lewis Carroll`

# Introduction

This post is a continuation of my previous post “Introducing cricketr! A R package to analyze the performances of cricketers.” In this post I take my package **cricketr** for a spin. For this analysis I focus on the Indian batting legends

– Sachin Tendulkar (Master Blaster)

– Rahul Dravid (The Will)

– Sourav Ganguly ( The Dada Prince)

– Sunil Gavaskar (Little Master)

This post is also hosted on RPubs – cricketr-1

```
library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)
```

## Relative Mean Strike Rate

In this first plot I plot the Mean Strike Rate of the batsmen. Tendulkar leads in the Mean Strike Rate for each runs in the range 100- 180. Ganguly has a very good Mean Strike Rate for runs range 40 -80

```
frames <- list("./tendulkar.csv","./dravid.csv","ganguly.csv","gavaskar.csv")
names <- list("Tendulkar","Dravid","Ganguly","Gavaskar")
relativeBatsmanSR(frames,names)
```

# Relative Runs Frequency Percentage

The plot below show the percentage contribution in each 10 runs bucket over the entire career.The percentage Runs Frequency is fairly close but Gavaskar seems to lead most of the way

```
frames <- list("./tendulkar.csv","./dravid.csv","ganguly.csv","gavaskar.csv")
names <- list("Tendulkar","Dravid","Ganguly","Gavaskar")
relativeRunsFreqPerf(frames,names)
```

# Moving Average of runs over career

The moving average for the 4 batsmen indicate the following – Tendulkar and Ganguly’s career has a downward trend and their retirement didn’t come too soon – Dravid and Gavaskar’s career definitely shows an upswing. They probably had a year or two left.

```
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./tendulkar.csv","Tendulkar")
batsmanMovingAverage("./dravid.csv","Dravid")
batsmanMovingAverage("./ganguly.csv","Ganguly")
batsmanMovingAverage("./gavaskar.csv","Gavaskar")
```

`dev.off()`

```
## null device
## 1
```

# Runs forecast

The forecast for the batsman is shown below. The plots indicate that only Tendulkar seemed to maintain a consistency over the period while the rest seem to score less than their forecasted runs in the last 10% of the career

```
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfForecast("./dravid.csv","Rahul Dravid")
batsmanPerfForecast("./ganguly.csv","Sourav Ganguly")
batsmanPerfForecast("./gavaskar.csv","Sunil Gavaskar")
```

`dev.off()`

```
## null device
## 1
```

# Check for batsman in-form/out-of-form

The following snippet checks whether the batsman is in-inform or ouyt-of-form during the last 10% innings of the career. This is done by choosing the null hypothesis (h0) to indicate that the batsmen are in-form. Ha is the alternative hypothesis that they are not-in-form. The population is based on the 1st 90% of career runs. The last 10% is taken as the sample and a check is made on the lower tail to see if the sample mean is less than 95% confidence interval. If this difference is >0.05 then the batsman is considered out-of-form.

The computation show that Tendulkar was out-of-form while the other’s weren’t. While Dravid and Gavaskar’s moving average do show an upward trend the surprise is Ganguly. This could be that Ganguly was able to keep his average in the last 10% to with the 95$ confidence interval. It has to be noted that Ganguly’s average was much lower than Tendulkar

`checkBatsmanInForm("./tendulkar.csv","Tendulkar")`

```
## *******************************************************************************************
##
## Population size: 294 Mean of population: 50.48
## Sample size: 33 Mean of sample: 32.42 SD of sample: 29.8
##
## Null hypothesis H0 : Tendulkar 's sample average is within 95% confidence interval
## of population average
## Alternative hypothesis Ha : Tendulkar 's sample average is below the 95% confidence
## interval of population average
##
## [1] "Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713 is less than alpha= 0.05"
## *******************************************************************************************
```

`checkBatsmanInForm("./dravid.csv","Dravid")`

```
## *******************************************************************************************
##
## Population size: 256 Mean of population: 46.98
## Sample size: 29 Mean of sample: 43.48 SD of sample: 40.89
##
## Null hypothesis H0 : Dravid 's sample average is within 95% confidence interval
## of population average
## Alternative hypothesis Ha : Dravid 's sample average is below the 95% confidence
## interval of population average
##
## [1] "Dravid 's Form Status: In-Form because the p value: 0.324138 is greater than alpha= 0.05"
## *******************************************************************************************
```

`checkBatsmanInForm("./ganguly.csv","Ganguly")`

```
## *******************************************************************************************
##
## Population size: 169 Mean of population: 38.94
## Sample size: 19 Mean of sample: 33.21 SD of sample: 32.97
##
## Null hypothesis H0 : Ganguly 's sample average is within 95% confidence interval
## of population average
## Alternative hypothesis Ha : Ganguly 's sample average is below the 95% confidence
## interval of population average
##
## [1] "Ganguly 's Form Status: In-Form because the p value: 0.229006 is greater than alpha= 0.05"
## *******************************************************************************************
```

`checkBatsmanInForm("./gavaskar.csv","Gavaskar")`

```
## *******************************************************************************************
##
## Population size: 125 Mean of population: 44.67
## Sample size: 14 Mean of sample: 57.86 SD of sample: 58.55
##
## Null hypothesis H0 : Gavaskar 's sample average is within 95% confidence interval
## of population average
## Alternative hypothesis Ha : Gavaskar 's sample average is below the 95% confidence
## interval of population average
##
## [1] "Gavaskar 's Form Status: In-Form because the p value: 0.793276 is greater than alpha= 0.05"
## *******************************************************************************************
```

`dev.off()`

```
## null device
## 1
```

# 3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

```
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./tendulkar.csv","Tendulkar")
battingPerf3d("./dravid.csv","Dravid")
```

```
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./ganguly.csv","Ganguly")
battingPerf3d("./gavaskar.csv","Gavaskar")
```

`dev.off()`

```
## null device
## 1
```

# Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

```
BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
tendulkar <- batsmanRunsPredict("./tendulkar.csv","Tendulkar",newdataframe=newDF)
dravid <- batsmanRunsPredict("./dravid.csv","Dravid",newdataframe=newDF)
ganguly <- batsmanRunsPredict("./ganguly.csv","Ganguly",newdataframe=newDF)
gavaskar <- batsmanRunsPredict("./gavaskar.csv","Gavaskar",newdataframe=newDF)
```

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease. It can be seen Tendulkar has a much higher Runs scored than all of the others.

Tendulkar is followed by Ganguly who we saw earlier had a very good strike rate. However it must be noted that Dravid and Gavaskar have a better average.

```
batsmen <-cbind(round(tendulkar$Runs),round(dravid$Runs),round(ganguly$Runs),round(gavaskar$Runs))
colnames(batsmen) <- c("Tendulkar","Dravid","Ganguly","Gavaskar")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
```

```
## BallsFaced MinsAtCrease Tendulkar Dravid Ganguly Gavaskar
## 1 10 30 7 1 7 4
## 2 38 71 23 14 21 17
## 3 66 111 39 27 35 30
## 4 94 152 54 40 50 43
## 5 121 193 70 54 64 56
## 6 149 234 86 67 78 69
## 7 177 274 102 80 93 82
## 8 205 315 118 94 107 95
## 9 233 356 134 107 121 108
## 10 261 396 150 120 136 121
## 11 289 437 165 134 150 134
## 12 316 478 181 147 165 147
## 13 344 519 197 160 179 160
## 14 372 559 213 173 193 173
## 15 400 600 229 187 208 186
```

# Contribution to matches won and lost

```
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanContributionWonLost(35320,"Tendulkar")
batsmanContributionWonLost(28114,"Dravid")
batsmanContributionWonLost(28779,"Ganguly")
batsmanContributionWonLost(28794,"Gavaskar")
```

# Conclusion

Here are some key conclusions 1. Tendulkar’s predicted performance for a given number of Balls Faced and Minutes at Crease is superior to the rest 2. Ganguly has a very good Mean Strike Rate for the range 40-80 and Tendulkar from 100-180 3. Dravid and Gavaskar probably retired a year or two earlier while Tendulkar and Ganguly’s time was clearly up

Also see my other posts in R

- A peek into literacy in India: Statistical Learning with R
- A crime map of India in R – Crimes against women
- Analyzing cricket’s batting legends – Through the mirage with R
- Masters of Spin: Unraveling the web with R
- Mirror, mirror . the best batsman of them all?

You may also like

- A crime map of India in R: Crimes against women
- What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
- Bend it like Bluemix, MongoDB with autoscaling – Part 2
- Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
- Thinking Web Scale (TWS-3): Map-Reduce – Bring compute to data
- Deblurring with OpenCV:Weiner filter reloaded