Practical Machine Learning with R and Python – Part 2


In this 2nd part of the series “Practical Machine Learning with R and Python – Part 2”, I continue where I left off in my first post Practical Machine Learning with R and Python – Part 2. In this post I cover the some classification algorithmns and cross validation. Specifically I touch
-Logistic Regression
-K Nearest Neighbors (KNN) classification
-Leave out one Cross Validation (LOOCV)
-K Fold Cross Validation
in both R and Python.

As in my initial post the algorithms are based on the following courses.

You can download this R Markdown file along with the data from Github. I hope these posts can be used as a quick reference in R and Python and Machine Learning.I have tried to include the coolest part of either course in this post.

The following classification problem is based on Logistic Regression. The data is an included data set in Scikit-Learn, which I have saved as csv and use it also for R. The fit of a classification Machine Learning Model depends on how correctly classifies the data. There are several measures of testing a model’s classification performance. They are

Accuracy = TP + TN / (TP + TN + FP + FN) – Fraction of all classes correctly classified
Precision = TP / (TP + FP) – Fraction of correctly classified positives among those classified as positive
Recall = TP / (TP + FN) Also known as sensitivity, or True Positive Rate (True positive) – Fraction of correctly classified as positive among all positives in the data
F1 = 2 * Precision * Recall / (Precision + Recall)

1a. Logistic Regression – R code

The caret and e1071 package is required for using the confusionMatrix call

source("RFunctions.R")
library(dplyr)
library(caret)
library(e1071)
# Read the data (from sklearn)
cancer <- read.csv("cancer.csv")
# Rename the target variable
names(cancer) <- c(seq(1,30),"output")
# Split as training and test sets
train_idx <- trainTestSplit(cancer,trainPercent=75,seed=5)
train <- cancer[train_idx, ]
test <- cancer[-train_idx, ]

# Fit a generalized linear logistic model, 
fit=glm(output~.,family=binomial,data=train,control = list(maxit = 50))
# Predict the output from the model
a=predict(fit,newdata=train,type="response")
# Set response >0.5 as 1 and <=0.5 as 0
b=ifelse(a>0.5,1,0)
# Compute the confusion matrix for training data
confusionMatrix(b,train$output)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 154   0
##          1   0 272
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9914, 1)
##     No Information Rate : 0.6385     
##     P-Value [Acc > NIR] : < 2.2e-16  
##                                      
##                   Kappa : 1          
##  Mcnemar's Test P-Value : NA         
##                                      
##             Sensitivity : 1.0000     
##             Specificity : 1.0000     
##          Pos Pred Value : 1.0000     
##          Neg Pred Value : 1.0000     
##              Prevalence : 0.3615     
##          Detection Rate : 0.3615     
##    Detection Prevalence : 0.3615     
##       Balanced Accuracy : 1.0000     
##                                      
##        'Positive' Class : 0          
## 
m=predict(fit,newdata=test,type="response")
n=ifelse(m>0.5,1,0)
# Compute the confusion matrix for test output
confusionMatrix(n,test$output)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 52  4
##          1  5 81
##                                           
##                Accuracy : 0.9366          
##                  95% CI : (0.8831, 0.9706)
##     No Information Rate : 0.5986          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8677          
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9123          
##             Specificity : 0.9529          
##          Pos Pred Value : 0.9286          
##          Neg Pred Value : 0.9419          
##              Prevalence : 0.4014          
##          Detection Rate : 0.3662          
##    Detection Prevalence : 0.3944          
##       Balanced Accuracy : 0.9326          
##                                           
##        'Positive' Class : 0               
## 

1b. Logistic Regression – Python code

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
os.chdir("C:\\Users\\Ganesh\\RandPython")
from sklearn.datasets import make_classification, make_blobs

from sklearn.metrics import confusion_matrix
from matplotlib.colors import ListedColormap
from sklearn.datasets import load_breast_cancer
# Load the cancer data
(X_cancer, y_cancer) = load_breast_cancer(return_X_y = True)
X_train, X_test, y_train, y_test = train_test_split(X_cancer, y_cancer,
                                                   random_state = 0)
# Call the Logisitic Regression function
clf = LogisticRegression().fit(X_train, y_train)
fig, subaxes = plt.subplots(1, 1, figsize=(7, 5))
# Fit a model
clf = LogisticRegression().fit(X_train, y_train)

# Compute and print the Accuray scores
print('Accuracy of Logistic regression classifier on training set: {:.2f}'
     .format(clf.score(X_train, y_train)))
print('Accuracy of Logistic regression classifier on test set: {:.2f}'
     .format(clf.score(X_test, y_test)))
y_predicted=clf.predict(X_test)
# Compute and print confusion matrix
confusion = confusion_matrix(y_test, y_predicted)
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print('Accuracy: {:.2f}'.format(accuracy_score(y_test, y_predicted)))
print('Precision: {:.2f}'.format(precision_score(y_test, y_predicted)))
print('Recall: {:.2f}'.format(recall_score(y_test, y_predicted)))
print('F1: {:.2f}'.format(f1_score(y_test, y_predicted)))
## Accuracy of Logistic regression classifier on training set: 0.96
## Accuracy of Logistic regression classifier on test set: 0.96
## Accuracy: 0.96
## Precision: 0.99
## Recall: 0.94
## F1: 0.97

2. Dummy variables

The following R and Python code show how dummy variables are handled in R and Python. Dummy variables are categorival variables which have to be converted into appropriate values before using them in Machine Learning Model For e.g. if we had currency as ‘dollar’, ‘rupee’ and ‘yen’ then the dummy variable will convert this as
dollar 0 0 0
rupee 0 0 1
yen 0 1 0

2a. Logistic Regression with dummy variables- R code

# Load the dummies library
library(dummies) 
df <- read.csv("adult1.csv",stringsAsFactors = FALSE,na.strings = c(""," "," ?"))

# Remove rows which have NA
df1 <- df[complete.cases(df),]
dim(df1)
## [1] 30161    16
# Select specific columns
adult <- df1 %>% dplyr::select(age,occupation,education,educationNum,capitalGain,
                               capital.loss,hours.per.week,native.country,salary)
# Set the dummy data with appropriate values
adult1 <- dummy.data.frame(adult, sep = ".")

#Split as training and test
train_idx <- trainTestSplit(adult1,trainPercent=75,seed=1111)
train <- adult1[train_idx, ]
test <- adult1[-train_idx, ]

# Fit a binomial logistic regression
fit=glm(salary~.,family=binomial,data=train)
# Predict response
a=predict(fit,newdata=train,type="response")
# If response >0.5 then it is a 1 and 0 otherwise
b=ifelse(a>0.5,1,0)
confusionMatrix(b,train$salary)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction     0     1
##          0 16065  3145
##          1   968  2442
##                                           
##                Accuracy : 0.8182          
##                  95% CI : (0.8131, 0.8232)
##     No Information Rate : 0.753           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.4375          
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.9432          
##             Specificity : 0.4371          
##          Pos Pred Value : 0.8363          
##          Neg Pred Value : 0.7161          
##              Prevalence : 0.7530          
##          Detection Rate : 0.7102          
##    Detection Prevalence : 0.8492          
##       Balanced Accuracy : 0.6901          
##                                           
##        'Positive' Class : 0               
## 
# Compute and display confusion matrix
m=predict(fit,newdata=test,type="response")
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type =
## ifelse(type == : prediction from a rank-deficient fit may be misleading
n=ifelse(m>0.5,1,0)
confusionMatrix(n,test$salary)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0 5263 1099
##          1  357  822
##                                           
##                Accuracy : 0.8069          
##                  95% CI : (0.7978, 0.8158)
##     No Information Rate : 0.7453          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.4174          
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.9365          
##             Specificity : 0.4279          
##          Pos Pred Value : 0.8273          
##          Neg Pred Value : 0.6972          
##              Prevalence : 0.7453          
##          Detection Rate : 0.6979          
##    Detection Prevalence : 0.8437          
##       Balanced Accuracy : 0.6822          
##                                           
##        'Positive' Class : 0               
## 

2b. Logistic Regression with dummy variables- Python code

Pandas has a get_dummies function for handling dummies

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Read data
df =pd.read_csv("adult1.csv",encoding="ISO-8859-1",na_values=[""," "," ?"])
# Drop rows with NA
df1=df.dropna()
print(df1.shape)
# Select specific columns
adult = df1[['age','occupation','education','educationNum','capitalGain','capital-loss', 
             'hours-per-week','native-country','salary']]

X=adult[['age','occupation','education','educationNum','capitalGain','capital-loss', 
             'hours-per-week','native-country']]
# Set approporiate values for dummy variables
X_adult=pd.get_dummies(X,columns=['occupation','education','native-country'])
y=adult['salary']

X_adult_train, X_adult_test, y_train, y_test = train_test_split(X_adult, y,
                                                   random_state = 0)
clf = LogisticRegression().fit(X_adult_train, y_train)

# Compute and display Accuracy and Confusion matrix
print('Accuracy of Logistic regression classifier on training set: {:.2f}'
     .format(clf.score(X_adult_train, y_train)))
print('Accuracy of Logistic regression classifier on test set: {:.2f}'
     .format(clf.score(X_adult_test, y_test)))
y_predicted=clf.predict(X_adult_test)
confusion = confusion_matrix(y_test, y_predicted)
print('Accuracy: {:.2f}'.format(accuracy_score(y_test, y_predicted)))
print('Precision: {:.2f}'.format(precision_score(y_test, y_predicted)))
print('Recall: {:.2f}'.format(recall_score(y_test, y_predicted)))
print('F1: {:.2f}'.format(f1_score(y_test, y_predicted)))
## (30161, 16)
## Accuracy of Logistic regression classifier on training set: 0.82
## Accuracy of Logistic regression classifier on test set: 0.81
## Accuracy: 0.81
## Precision: 0.68
## Recall: 0.41
## F1: 0.51

3a – K Nearest Neighbors Classification – R code

The Adult data set is taken from UCI Machine Learning Repository

source("RFunctions.R")
df <- read.csv("adult1.csv",stringsAsFactors = FALSE,na.strings = c(""," "," ?"))
# Remove rows which have NA
df1 <- df[complete.cases(df),]
dim(df1)
## [1] 30161    16
# Select specific columns
adult <- df1 %>% dplyr::select(age,occupation,education,educationNum,capitalGain,
                               capital.loss,hours.per.week,native.country,salary)
# Set dummy variables
adult1 <- dummy.data.frame(adult, sep = ".")

#Split train and test as required by KNN classsification model
train_idx <- trainTestSplit(adult1,trainPercent=75,seed=1111)
train <- adult1[train_idx, ]
test <- adult1[-train_idx, ]
train.X <- train[,1:76]
train.y <- train[,77]
test.X <- test[,1:76]
test.y <- test[,77]

# Fit a model for 1,3,5,10 and 15 neighbors
cMat <- NULL
neighbors <-c(1,3,5,10,15)
for(i in seq_along(neighbors)){
    fit =knn(train.X,test.X,train.y,k=i)
    table(fit,test.y)
    a<-confusionMatrix(fit,test.y)
    cMat[i] <- a$overall[1]
    print(a$overall[1])
}
##  Accuracy 
## 0.7835831 
##  Accuracy 
## 0.8162047 
##  Accuracy 
## 0.8089113 
##  Accuracy 
## 0.8209787 
##  Accuracy 
## 0.8184591
#Plot the Accuracy for each of the KNN models
df <- data.frame(neighbors,Accuracy=cMat)
ggplot(df,aes(x=neighbors,y=Accuracy)) + geom_point() +geom_line(color="blue") +
    xlab("Number of neighbors") + ylab("Accuracy") +
    ggtitle("KNN regression - Accuracy vs Number of Neighors (Unnormalized)")

3b – K Nearest Neighbors Classification – Python code

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import MinMaxScaler

# Read data
df =pd.read_csv("adult1.csv",encoding="ISO-8859-1",na_values=[""," "," ?"])
df1=df.dropna()
print(df1.shape)
# Select specific columns
adult = df1[['age','occupation','education','educationNum','capitalGain','capital-loss', 
             'hours-per-week','native-country','salary']]

X=adult[['age','occupation','education','educationNum','capitalGain','capital-loss', 
             'hours-per-week','native-country']]
             
#Set values for dummy variables
X_adult=pd.get_dummies(X,columns=['occupation','education','native-country'])
y=adult['salary']

X_adult_train, X_adult_test, y_train, y_test = train_test_split(X_adult, y,
                                                   random_state = 0)
                                                   
# KNN classification in Python requires the data to be scaled. 
# Scale the data
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_adult_train)
# Apply scaling to test set also
X_test_scaled = scaler.transform(X_adult_test)
# Compute the KNN model for 1,3,5,10 & 15 neighbors
accuracy=[]
neighbors=[1,3,5,10,15]
for i in neighbors:
    knn = KNeighborsClassifier(n_neighbors = i)
    knn.fit(X_train_scaled, y_train)
    accuracy.append(knn.score(X_test_scaled, y_test))
    print('Accuracy test score: {:.3f}'
        .format(knn.score(X_test_scaled, y_test)))

# Plot the models with the Accuracy attained for each of these models    
fig1=plt.plot(neighbors,accuracy)
fig1=plt.title("KNN regression - Accuracy vs Number of neighbors")
fig1=plt.xlabel("Neighbors")
fig1=plt.ylabel("Accuracy")
fig1.figure.savefig('foo1.png', bbox_inches='tight')
## (30161, 16)
## Accuracy test score: 0.749
## Accuracy test score: 0.779
## Accuracy test score: 0.793
## Accuracy test score: 0.804
## Accuracy test score: 0.803

Output image:

4 MPG vs Horsepower

The following scatter plot shows the non-linear relation between mpg and horsepower. This will be used as the data input for computing K Fold Cross Validation Error

4a MPG vs Horsepower scatter plot – R Code

df=read.csv("auto_mpg.csv",stringsAsFactors = FALSE) # Data from UCI
df1 <- as.data.frame(sapply(df,as.numeric))
df2 <- df1 %>% dplyr::select(cylinder,displacement, horsepower,weight, acceleration, year,mpg)
df3 <- df2[complete.cases(df2),]
ggplot(df3,aes(x=horsepower,y=mpg)) + geom_point() + xlab("Horsepower") + 
    ylab("Miles Per gallon") + ggtitle("Miles per Gallon vs Hosrsepower")

4b MPG vs Horsepower scatter plot – Python Code

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
autoDF =pd.read_csv("auto_mpg.csv",encoding="ISO-8859-1")
autoDF.shape
autoDF.columns
autoDF1=autoDF[['mpg','cylinder','displacement','horsepower','weight','acceleration','year']]
autoDF2 = autoDF1.apply(pd.to_numeric, errors='coerce')
autoDF3=autoDF2.dropna()
autoDF3.shape
#X=autoDF3[['cylinder','displacement','horsepower','weight']]
X=autoDF3[['horsepower']]
y=autoDF3['mpg']

fig11=plt.scatter(X,y)
fig11=plt.title("KNN regression - Accuracy vs Number of neighbors")
fig11=plt.xlabel("Neighbors")
fig11=plt.ylabel("Accuracy")
fig11.figure.savefig('foo11.png', bbox_inches='tight')

5 K Fold Cross Validation

K Fold Cross Validation is a technique in which the data set is divided into K Folds or K partitions. The Machine Learning model is trained on K-1 folds and tested on the Kth fold i.e.
we will have K-1 folds for training data and 1 for testing the ML model. Since we can partition this as C_{1}^{K} or K choose 1, there will be K such partitions. The K Fold Cross
Validation estimates the average validation error that we can expect on a new unseen test data.

The formula for K Fold Cross validation is as follows

MSE_{K} = \frac{\sum (y-yhat)^{2}}{n_{K}}
and
n_{K} = \frac{N}{K}
and
CV_{K} = \sum_{K=1}^{K} (\frac{n_{K}}{N}) MSE_{K}

where n_{K} is the number of elements in partition ‘K’ and N is the total number of elements
CV_{K} =\sum_{K=1}^{K} MSE_{K}

CV_{K} =\frac{\sum_{K=1}^{K} MSE_{K}}{K}
Leave Out one Cross Validation (LOOCV) is a special case of K Fold Cross Validation where N-1 data points are used to train the model and 1 data point is used to test the model. There are N such paritions of N-1 & 1 that are possible. The mean error is measured The Cross Valifation Error for LOOCV is

CV_{N} = \frac{1}{n} *\frac{\sum_{1}^{n}(y-yhat)^{2}}{1-h_{i}}
where h_{i} is the diagonal hat matrix

see [Statistical Learning]

The above formula is also included in this blog post

It took me a day and a half to implement the K Fold Cross Validation formula. I think it is correct. In any case do let me know if you think it is off

5a. Leave out one cross validation (LOOCV) – R Code

R uses the package ‘boot’ for performing Cross Validation error computation

library(boot)
library(reshape2)
# Read data
df=read.csv("auto_mpg.csv",stringsAsFactors = FALSE) # Data from UCI
df1 <- as.data.frame(sapply(df,as.numeric))
# Select complete cases
df2 <- df1 %>% dplyr::select(cylinder,displacement, horsepower,weight, acceleration, year,mpg)
df3 <- df2[complete.cases(df2),]
set.seed(17)
cv.error=rep(0,10)
# For polynomials 1,2,3... 10 fit a LOOCV model
for (i in 1:10){
    glm.fit=glm(mpg~poly(horsepower,i),data=df3)
    cv.error[i]=cv.glm(df3,glm.fit)$delta[1]
    
}
cv.error
##  [1] 24.23151 19.24821 19.33498 19.42443 19.03321 18.97864 18.83305
##  [8] 18.96115 19.06863 19.49093
# Create and display a plot
folds <- seq(1,10)
df <- data.frame(folds,cvError=cv.error)
ggplot(df,aes(x=folds,y=cvError)) + geom_point() +geom_line(color="blue") +
    xlab("Degree of Polynomial") + ylab("Cross Validation Error") +
    ggtitle("Leave one out Cross Validation - Cross Validation Error vs Degree of Polynomial")

5b. Leave out one cross validation (LOOCV) – Python Code

In Python there is no available function to compute Cross Validation error and we have to compute the above formula. I have done this after several hours. I think it is now in reasonable shape. Do let me know if you think otherwise. For LOOCV I use the K Fold Cross Validation with K=N

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.cross_validation import train_test_split, KFold
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
# Read data
autoDF =pd.read_csv("auto_mpg.csv",encoding="ISO-8859-1")
autoDF.shape
autoDF.columns
autoDF1=autoDF[['mpg','cylinder','displacement','horsepower','weight','acceleration','year']]
autoDF2 = autoDF1.apply(pd.to_numeric, errors='coerce')
# Remove rows with NAs
autoDF3=autoDF2.dropna()
autoDF3.shape
X=autoDF3[['horsepower']]
y=autoDF3['mpg']

# For polynomial degree 1,2,3... 10
def computeCVError(X,y,folds):
    deg=[]
    mse=[]
    degree1=[1,2,3,4,5,6,7,8,9,10]
    
    nK=len(X)/float(folds)
    xval_err=0
    # For degree 'j'
    for j in degree1: 
        # Split as 'folds'
        kf = KFold(len(X),n_folds=folds)
        for train_index, test_index in kf:
            # Create the appropriate train and test partitions from the fold index
            X_train, X_test = X.iloc[train_index], X.iloc[test_index]
            y_train, y_test = y.iloc[train_index], y.iloc[test_index]  

            # For the polynomial degree 'j'
            poly = PolynomialFeatures(degree=j)        
            # Transform the X_train and X_test
            X_train_poly = poly.fit_transform(X_train)
            X_test_poly = poly.fit_transform(X_test)
            # Fit a model on the transformed data
            linreg = LinearRegression().fit(X_train_poly, y_train)
            # Compute yhat or ypred
            y_pred = linreg.predict(X_test_poly)   
            # Compute MSE * n_K/N
            test_mse = mean_squared_error(y_test, y_pred)*float(len(X_train))/float(len(X))     
            # Add the test_mse for this partition of the data
            mse.append(test_mse)
        # Compute the mean of all folds for degree 'j'   
        deg.append(np.mean(mse))
        
    return(deg)


df=pd.DataFrame()
print(len(X))
# Call the function once. For LOOCV K=N. hence len(X) is passed as number of folds
cvError=computeCVError(X,y,len(X))

# Create and plot LOOCV
df=pd.DataFrame(cvError)
fig3=df.plot()
fig3=plt.title("Leave one out Cross Validation - Cross Validation Error vs Degree of Polynomial")
fig3=plt.xlabel("Degree of Polynomial")
fig3=plt.ylabel("Cross validation Error")
fig3.figure.savefig('foo3.png', bbox_inches='tight')

 

6a K Fold Cross Validation – R code

Here K Fold Cross Validation is done for 4, 5 and 10 folds using the R package boot and the glm package

library(boot)
library(reshape2)
set.seed(17)
#Read data
df=read.csv("auto_mpg.csv",stringsAsFactors = FALSE) # Data from UCI
df1 <- as.data.frame(sapply(df,as.numeric))
df2 <- df1 %>% dplyr::select(cylinder,displacement, horsepower,weight, acceleration, year,mpg)
df3 <- df2[complete.cases(df2),]
a=matrix(rep(0,30),nrow=3,ncol=10)
set.seed(17)
# Set the folds as 4,5 and 10
folds<-c(4,5,10)
for(i in seq_along(folds)){
    cv.error.10=rep(0,10)
    for (j in 1:10){
        # Fit a generalized linear model
        glm.fit=glm(mpg~poly(horsepower,j),data=df3)
        # Compute K Fold Validation error
        a[i,j]=cv.glm(df3,glm.fit,K=folds[i])$delta[1]
        
    }
    
}

# Create and display the K Fold Cross Validation Error
b <- t(a)
df <- data.frame(b)
df1 <- cbind(seq(1,10),df)
names(df1) <- c("PolynomialDegree","4-fold","5-fold","10-fold")

df2 <- melt(df1,id="PolynomialDegree")
ggplot(df2) + geom_line(aes(x=PolynomialDegree, y=value, colour=variable),size=2) +
    xlab("Degree of Polynomial") + ylab("Cross Validation Error") +
    ggtitle("K Fold Cross Validation - Cross Validation Error vs Degree of Polynomial")

6b. K Fold Cross Validation – Python code

The implementation of K-Fold Cross Validation Error has to be implemented and I have done this below. There is a small discrepancy in the shapes of the curves with the R plot above. Not sure why!

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.cross_validation import train_test_split, KFold
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
# Read data
autoDF =pd.read_csv("auto_mpg.csv",encoding="ISO-8859-1")
autoDF.shape
autoDF.columns
autoDF1=autoDF[['mpg','cylinder','displacement','horsepower','weight','acceleration','year']]
autoDF2 = autoDF1.apply(pd.to_numeric, errors='coerce')
# Drop NA rows
autoDF3=autoDF2.dropna()
autoDF3.shape
#X=autoDF3[['cylinder','displacement','horsepower','weight']]
X=autoDF3[['horsepower']]
y=autoDF3['mpg']

# Create Cross Validation function
def computeCVError(X,y,folds):
    deg=[]
    mse=[]
    # For degree 1,2,3,..10
    degree1=[1,2,3,4,5,6,7,8,9,10]
    
    nK=len(X)/float(folds)
    xval_err=0
    for j in degree1: 
        # Split the data into 'folds'
        kf = KFold(len(X),n_folds=folds)
        for train_index, test_index in kf:
            # Partition the data acccording the fold indices generated
            X_train, X_test = X.iloc[train_index], X.iloc[test_index]
            y_train, y_test = y.iloc[train_index], y.iloc[test_index]  

            # Scale the X_train and X_test as per the polynomial degree 'j'
            poly = PolynomialFeatures(degree=j)             
            X_train_poly = poly.fit_transform(X_train)
            X_test_poly = poly.fit_transform(X_test)
            # Fit a polynomial regression
            linreg = LinearRegression().fit(X_train_poly, y_train)
            # Compute yhat or ypred
            y_pred = linreg.predict(X_test_poly)  
            # Compute MSE *(nK/N)
            test_mse = mean_squared_error(y_test, y_pred)*float(len(X_train))/float(len(X))  
            # Append to list for different folds
            mse.append(test_mse)
        # Compute the mean for poylnomial 'j' 
        deg.append(np.mean(mse))
        
    return(deg)

# Create and display a plot of K -Folds
df=pd.DataFrame()
for folds in [4,5,10]:
    cvError=computeCVError(X,y,folds)
    #print(cvError)
    df1=pd.DataFrame(cvError)
    df=pd.concat([df,df1],axis=1)
    #print(cvError)
    
df.columns=['4-fold','5-fold','10-fold']
df=df.reindex([1,2,3,4,5,6,7,8,9,10])
df
fig2=df.plot()
fig2=plt.title("K Fold Cross Validation - Cross Validation Error vs Degree of Polynomial")
fig2=plt.xlabel("Degree of Polynomial")
fig2=plt.ylabel("Cross validation Error")
fig2.figure.savefig('foo2.png', bbox_inches='tight')

output

This concludes this 2nd part of this series. I will look into model tuning and model selection in R and Python in the coming parts. Comments, suggestions and corrections are welcome!
To be continued….
Watch this space!

Also see

  1. Design Principles of Scalable, Distributed Systems
  2. Re-introducing cricketr! : An R package to analyze performances of cricketers
  3. Spicing up a IBM Bluemix cloud app with MongoDB and NodeExpress
  4. Using Linear Programming (LP) for optimizing bowling change or batting lineup in T20 cricket
  5. Simulating an Edge Shape in Android

To see all posts see Index of posts

Advertisements

Practical Machine Learning with R and Python – Part 1


Introduction

This is the 1st part of a series of posts I intend to write on some common Machine Learning Algorithms in R and Python. In this first part I cover the following Machine Learning Algorithms

  • Univariate Regression
  • Multivariate Regression
  • Polynomial Regression
  • K Nearest Neighbors Regression

The code includes the implementation in both R and Python. This series of posts are based on the following 2 MOOC courses I did at Stanford Online and at Coursera

  1. Statistical Learning, Prof Trevor Hastie & Prof Robert Tibesherani, Online Stanford
  2. Applied Machine Learning in Python Prof Kevyn-Collin Thomson, University Of Michigan, Coursera

I have used the data sets from UCI Machine Learning repository(Communities and Crime and Auto MPG). I also use the Boston data set from MASS package

While coding in R and Python I found that there were some aspects that were more convenient in one language and some in the other. For example, plotting the fit in R is straightforward in R, while computing the R squared, splitting as Train & Test sets etc. are already available in Python. In any case, these minor inconveniences can be easily be implemented in either language.

R squared computation in R is computed as follows
RSS=\sum (y-yhat)^{2}
TSS= \sum(y-mean(y))^{2}
Rsquared- 1-\frac{RSS}{TSS}

Note: You can download this R Markdown file and the associated data sets from Github at MachineLearning-RandPython
Note 1: This post was created as an R Markdown file in RStudio which has a cool feature of including R and Python snippets. The plot of matplotlib needs a workaround but otherwise this is a real cool feature of RStudio!

1.1a Univariate Regression – R code

Here a simple linear regression line is fitted between a single input feature and the target variable

# Source in the R function library
source("RFunctions.R")
# Read the Boston data file
df=read.csv("Boston.csv",stringsAsFactors = FALSE) # Data from MASS - Statistical Learning

# Split the data into training and test sets (75:25)
train_idx <- trainTestSplit(df,trainPercent=75,seed=5)
train <- df[train_idx, ]
test <- df[-train_idx, ]

# Fit a linear regression line between 'Median value of owner occupied homes' vs 'lower status of 
# population'
fit=lm(medv~lstat,data=df)
# Display details of fir
summary(fit)
## 
## Call:
## lm(formula = medv ~ lstat, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.168  -3.990  -1.318   2.034  24.500 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.55384    0.56263   61.41   <2e-16 ***
## lstat       -0.95005    0.03873  -24.53   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.216 on 504 degrees of freedom
## Multiple R-squared:  0.5441, Adjusted R-squared:  0.5432 
## F-statistic: 601.6 on 1 and 504 DF,  p-value: < 2.2e-16
# Display the confidence intervals
confint(fit)
##                 2.5 %     97.5 %
## (Intercept) 33.448457 35.6592247
## lstat       -1.026148 -0.8739505
plot(df$lstat,df$medv, xlab="Lower status (%)",ylab="Median value of owned homes ($1000)", main="Median value of homes ($1000) vs Lowe status (%)")
abline(fit)
abline(fit,lwd=3)
abline(fit,lwd=3,col="red")

rsquared=Rsquared(fit,test,test$medv)
sprintf("R-squared for uni-variate regression (Boston.csv)  is : %f", rsquared)
## [1] "R-squared for uni-variate regression (Boston.csv)  is : 0.556964"

1.1b Univariate Regression – Python code

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
#os.chdir("C:\\software\\machine-learning\\RandPython")

# Read the CSV file
df = pd.read_csv("Boston.csv",encoding = "ISO-8859-1")
# Select the feature variable
X=df['lstat']

# Select the target 
y=df['medv']

# Split into train and test sets (75:25)
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state = 0)
X_train=X_train.values.reshape(-1,1)
X_test=X_test.values.reshape(-1,1)

# Fit a linear model
linreg = LinearRegression().fit(X_train, y_train)

# Print the training and test R squared score
print('R-squared score (training): {:.3f}'.format(linreg.score(X_train, y_train)))
print('R-squared score (test): {:.3f}'.format(linreg.score(X_test, y_test)))
     
# Plot the linear regression line
fig=plt.scatter(X_train,y_train)

# Create a range of points. Compute yhat=coeff1*x + intercept and plot
x=np.linspace(0,40,20)
fig1=plt.plot(x, linreg.coef_ * x + linreg.intercept_, color='red')
fig1=plt.title("Median value of homes ($1000) vs Lowe status (%)")
fig1=plt.xlabel("Lower status (%)")
fig1=plt.ylabel("Median value of owned homes ($1000)")
fig.figure.savefig('foo.png', bbox_inches='tight')
fig1.figure.savefig('foo1.png', bbox_inches='tight')
print "Finished"
## R-squared score (training): 0.571
## R-squared score (test): 0.458
## Finished

1.2a Multivariate Regression – R code

# Read crimes data
crimesDF <- read.csv("crimes.csv",stringsAsFactors = FALSE)

# Remove the 1st 7 columns which do not impact output
crimesDF1 <- crimesDF[,7:length(crimesDF)]

# Convert all to numeric
crimesDF2 <- sapply(crimesDF1,as.numeric)

# Check for NAs
a <- is.na(crimesDF2)
# Set to 0 as an imputation
crimesDF2[a] <-0
#Create as a dataframe
crimesDF2 <- as.data.frame(crimesDF2)
#Create a train/test split
train_idx <- trainTestSplit(crimesDF2,trainPercent=75,seed=5)
train <- crimesDF2[train_idx, ]
test <- crimesDF2[-train_idx, ]

# Fit a multivariate regression model between crimesPerPop and all other features
fit <- lm(ViolentCrimesPerPop~.,data=train)

# Compute and print R Squared
rsquared=Rsquared(fit,test,test$ViolentCrimesPerPop)
sprintf("R-squared for multi-variate regression (crimes.csv)  is : %f", rsquared)
## [1] "R-squared for multi-variate regression (crimes.csv)  is : 0.653940"

1.2b Multivariate Regression – Python code

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Read the data
crimesDF =pd.read_csv("crimes.csv",encoding="ISO-8859-1")
#Remove the 1st 7 columns
crimesDF1=crimesDF.iloc[:,7:crimesDF.shape[1]]
# Convert to numeric
crimesDF2 = crimesDF1.apply(pd.to_numeric, errors='coerce')
# Impute NA to 0s
crimesDF2.fillna(0, inplace=True)

# Select the X (feature vatiables - all)
X=crimesDF2.iloc[:,0:120]

# Set the target
y=crimesDF2.iloc[:,121]

X_train, X_test, y_train, y_test = train_test_split(X, y,random_state = 0)
# Fit a multivariate regression model
linreg = LinearRegression().fit(X_train, y_train)

# compute and print the R Square
print('R-squared score (training): {:.3f}'.format(linreg.score(X_train, y_train)))
print('R-squared score (test): {:.3f}'.format(linreg.score(X_test, y_test)))
## R-squared score (training): 0.699
## R-squared score (test): 0.677

1.3a Polynomial Regression – R

For Polynomial regression , polynomials of degree 1,2 & 3 are used and R squared is computed. It can be seen that the quadaratic model provides the best R squared score and hence the best fit

 # Polynomial degree 1
df=read.csv("auto_mpg.csv",stringsAsFactors = FALSE) # Data from UCI
df1 <- as.data.frame(sapply(df,as.numeric))

# Select key columns
df2 <- df1 %>% select(cylinder,displacement, horsepower,weight, acceleration, year,mpg)
df3 <- df2[complete.cases(df2),]

# Split as train and test sets
train_idx <- trainTestSplit(df3,trainPercent=75,seed=5)
train <- df3[train_idx, ]
test <- df3[-train_idx, ]

# Fit a model of degree 1
fit <- lm(mpg~. ,data=train)
rsquared1 <-Rsquared(fit,test,test$mpg)
sprintf("R-squared for Polynomial regression of degree 1 (auto_mpg.csv)  is : %f", rsquared1)
## [1] "R-squared for Polynomial regression of degree 1 (auto_mpg.csv)  is : 0.763607"
# Polynomial degree 2 - Quadratic
x = as.matrix(df3[1:6])
# Make a  polynomial  of degree 2 for feature variables before split
df4=as.data.frame(poly(x,2,raw=TRUE))
df5 <- cbind(df4,df3[7])

# Split into train and test set
train_idx <- trainTestSplit(df5,trainPercent=75,seed=5)
train <- df5[train_idx, ]
test <- df5[-train_idx, ]

# Fit the quadratic model
fit <- lm(mpg~. ,data=train)
# Compute R squared
rsquared2=Rsquared(fit,test,test$mpg)
sprintf("R-squared for Polynomial regression of degree 2 (auto_mpg.csv)  is : %f", rsquared2)
## [1] "R-squared for Polynomial regression of degree 2 (auto_mpg.csv)  is : 0.831372"
#Polynomial degree 3
x = as.matrix(df3[1:6])
# Make polynomial of degree 4  of feature variables before split
df4=as.data.frame(poly(x,3,raw=TRUE))
df5 <- cbind(df4,df3[7])
train_idx <- trainTestSplit(df5,trainPercent=75,seed=5)

train <- df5[train_idx, ]
test <- df5[-train_idx, ]
# Fit a model of degree 3
fit <- lm(mpg~. ,data=train)
# Compute R squared
rsquared3=Rsquared(fit,test,test$mpg)
sprintf("R-squared for Polynomial regression of degree 2 (auto_mpg.csv)  is : %f", rsquared3)
## [1] "R-squared for Polynomial regression of degree 2 (auto_mpg.csv)  is : 0.773225"
df=data.frame(degree=c(1,2,3),Rsquared=c(rsquared1,rsquared2,rsquared3))
# Make a plot of Rsquared and degree
ggplot(df,aes(x=degree,y=Rsquared)) +geom_point() + geom_line(color="blue") +
    ggtitle("Polynomial regression - R squared vs Degree of polynomial") +
    xlab("Degree") + ylab("R squared")

1.3a Polynomial Regression – Python

For Polynomial regression , polynomials of degree 1,2 & 3 are used and R squared is computed. It can be seen that the quadaratic model provides the best R squared score and hence the best fit

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
autoDF =pd.read_csv("auto_mpg.csv",encoding="ISO-8859-1")
autoDF.shape
autoDF.columns
# Select key columns
autoDF1=autoDF[['mpg','cylinder','displacement','horsepower','weight','acceleration','year']]
# Convert columns to numeric
autoDF2 = autoDF1.apply(pd.to_numeric, errors='coerce')
# Drop NAs
autoDF3=autoDF2.dropna()
autoDF3.shape
X=autoDF3[['cylinder','displacement','horsepower','weight','acceleration','year']]
y=autoDF3['mpg']

# Polynomial degree 1
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state = 0)
linreg = LinearRegression().fit(X_train, y_train)
print('R-squared score - Polynomial degree 1 (training): {:.3f}'.format(linreg.score(X_train, y_train)))
# Compute R squared     
rsquared1 =linreg.score(X_test, y_test)
print('R-squared score - Polynomial degree 1 (test): {:.3f}'.format(linreg.score(X_test, y_test)))

# Polynomial degree 2
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_poly, y,random_state = 0)
linreg = LinearRegression().fit(X_train, y_train)

# Compute R squared
print('R-squared score - Polynomial degree 2 (training): {:.3f}'.format(linreg.score(X_train, y_train)))
rsquared2 =linreg.score(X_test, y_test)
print('R-squared score - Polynomial degree 2 (test): {:.3f}\n'.format(linreg.score(X_test, y_test)))

#Polynomial degree 3

poly = PolynomialFeatures(degree=3)
X_poly = poly.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_poly, y,random_state = 0)
linreg = LinearRegression().fit(X_train, y_train)
print('(R-squared score -Polynomial degree 3  (training): {:.3f}'
     .format(linreg.score(X_train, y_train)))
# Compute R squared     
rsquared3 =linreg.score(X_test, y_test)
print('R-squared score Polynomial degree 3 (test): {:.3f}\n'.format(linreg.score(X_test, y_test)))
degree=[1,2,3]
rsquared =[rsquared1,rsquared2,rsquared3]
fig2=plt.plot(degree,rsquared)
fig2=plt.title("Polynomial regression - R squared vs Degree of polynomial")
fig2=plt.xlabel("Degree")
fig2=plt.ylabel("R squared")
fig2.figure.savefig('foo2.png', bbox_inches='tight')
print "Finished plotting and saving"
## R-squared score - Polynomial degree 1 (training): 0.811
## R-squared score - Polynomial degree 1 (test): 0.799
## R-squared score - Polynomial degree 2 (training): 0.861
## R-squared score - Polynomial degree 2 (test): 0.847
## 
## (R-squared score -Polynomial degree 3  (training): 0.933
## R-squared score Polynomial degree 3 (test): 0.710
## 
## Finished plotting and saving

1.4 K Nearest Neighbors

The code below implements KNN Regression both for R and Python. This is done for different neighbors. The R squared is computed in each case. This is repeated after performing feature scaling. It can be seen the model fit is much better after feature scaling. Normalization refers to

X_{normalized} = \frac{X-min(X)}{max(X-min(X))}

Another technique that is used is Standardization which is

X_{standardized} = \frac{X-mean(X)}{sd(X)}

1.4a K Nearest Neighbors Regression – R( Unnormalized)

The R code below does not use feature scaling

# KNN regression requires the FNN package
df=read.csv("auto_mpg.csv",stringsAsFactors = FALSE) # Data from UCI
df1 <- as.data.frame(sapply(df,as.numeric))
df2 <- df1 %>% select(cylinder,displacement, horsepower,weight, acceleration, year,mpg)
df3 <- df2[complete.cases(df2),]

# Split train and test
train_idx <- trainTestSplit(df3,trainPercent=75,seed=5)
train <- df3[train_idx, ]
test <- df3[-train_idx, ]
#  Select the feature variables
train.X=train[,1:6]
# Set the target for training
train.Y=train[,7]
# Do the same for test set
test.X=test[,1:6]
test.Y=test[,7]

rsquared <- NULL
# Create a list of neighbors
neighbors <-c(1,2,4,8,10,14)
for(i in seq_along(neighbors)){
    # Perform a KNN regression fit
    knn=knn.reg(train.X,test.X,train.Y,k=neighbors[i])
    # Compute R sqaured
    rsquared[i]=knnRSquared(knn$pred,test.Y)
}

# Make a dataframe for plotting
df <- data.frame(neighbors,Rsquared=rsquared)
# Plot the number of neighors vs the R squared
ggplot(df,aes(x=neighbors,y=Rsquared)) + geom_point() +geom_line(color="blue") +
    xlab("Number of neighbors") + ylab("R squared") +
    ggtitle("KNN regression - R squared vs Number of Neighors (Unnormalized)")

1.4b K Nearest Neighbors Regression – Python( Unnormalized)

The Python code below does not use feature scaling

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.neighbors import KNeighborsRegressor
autoDF =pd.read_csv("auto_mpg.csv",encoding="ISO-8859-1")
autoDF.shape
autoDF.columns
autoDF1=autoDF[['mpg','cylinder','displacement','horsepower','weight','acceleration','year']]
autoDF2 = autoDF1.apply(pd.to_numeric, errors='coerce')
autoDF3=autoDF2.dropna()
autoDF3.shape
X=autoDF3[['cylinder','displacement','horsepower','weight','acceleration','year']]
y=autoDF3['mpg']

# Perform a train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)
# Create a list of neighbors
rsquared=[]
neighbors=[1,2,4,8,10,14]
for i in neighbors:
        # Fit a KNN model
        knnreg = KNeighborsRegressor(n_neighbors = i).fit(X_train, y_train)
        # Compute R squared
        rsquared.append(knnreg.score(X_test, y_test))
        print('R-squared test score: {:.3f}'
        .format(knnreg.score(X_test, y_test)))
# Plot the number of neighors vs the R squared        
fig3=plt.plot(neighbors,rsquared)
fig3=plt.title("KNN regression - R squared vs Number of neighbors(Unnormalized)")
fig3=plt.xlabel("Neighbors")
fig3=plt.ylabel("R squared")
fig3.figure.savefig('foo3.png', bbox_inches='tight')
print "Finished plotting and saving"
## R-squared test score: 0.527
## R-squared test score: 0.678
## R-squared test score: 0.707
## R-squared test score: 0.684
## R-squared test score: 0.683
## R-squared test score: 0.670
## Finished plotting and saving

1.4c K Nearest Neighbors Regression – R( Normalized)

It can be seen that R squared improves when the features are normalized.

df=read.csv("auto_mpg.csv",stringsAsFactors = FALSE) # Data from UCI
df1 <- as.data.frame(sapply(df,as.numeric))
df2 <- df1 %>% select(cylinder,displacement, horsepower,weight, acceleration, year,mpg)
df3 <- df2[complete.cases(df2),]

# Perform MinMaxScaling of feature variables 
train.X.scaled=MinMaxScaler(train.X)
test.X.scaled=MinMaxScaler(test.X)

# Create a list of neighbors
rsquared <- NULL
neighbors <-c(1,2,4,6,8,10,12,15,20,25,30)
for(i in seq_along(neighbors)){
    # Fit a KNN model
    knn=knn.reg(train.X.scaled,test.X.scaled,train.Y,k=i)
    # Compute R ssquared
    rsquared[i]=knnRSquared(knn$pred,test.Y)
    
}

df <- data.frame(neighbors,Rsquared=rsquared)
# Plot the number of neighors vs the R squared 
ggplot(df,aes(x=neighbors,y=Rsquared)) + geom_point() +geom_line(color="blue") +
    xlab("Number of neighbors") + ylab("R squared") +
    ggtitle("KNN regression - R squared vs Number of Neighors(Normalized)")

1.4d K Nearest Neighbors Regression – Python( Normalized)

R squared improves when the features are normalized with MinMaxScaling

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import MinMaxScaler
autoDF =pd.read_csv("auto_mpg.csv",encoding="ISO-8859-1")
autoDF.shape
autoDF.columns
autoDF1=autoDF[['mpg','cylinder','displacement','horsepower','weight','acceleration','year']]
autoDF2 = autoDF1.apply(pd.to_numeric, errors='coerce')
autoDF3=autoDF2.dropna()
autoDF3.shape
X=autoDF3[['cylinder','displacement','horsepower','weight','acceleration','year']]
y=autoDF3['mpg']

# Perform a train/ test  split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)
# Use MinMaxScaling
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
# Apply scaling on test set
X_test_scaled = scaler.transform(X_test)

# Create a list of neighbors
rsquared=[]
neighbors=[1,2,4,6,8,10,12,15,20,25,30]
for i in neighbors:
    # Fit a KNN model
    knnreg = KNeighborsRegressor(n_neighbors = i).fit(X_train_scaled, y_train)
    # Compute R squared
    rsquared.append(knnreg.score(X_test_scaled, y_test))
    print('R-squared test score: {:.3f}'
        .format(knnreg.score(X_test_scaled, y_test)))

# Plot the number of neighors vs the R squared 
fig4=plt.plot(neighbors,rsquared)
fig4=plt.title("KNN regression - R squared vs Number of neighbors(Normalized)")
fig4=plt.xlabel("Neighbors")
fig4=plt.ylabel("R squared")
fig4.figure.savefig('foo4.png', bbox_inches='tight')
print "Finished plotting and saving"
## R-squared test score: 0.703
## R-squared test score: 0.810
## R-squared test score: 0.830
## R-squared test score: 0.838
## R-squared test score: 0.834
## R-squared test score: 0.828
## R-squared test score: 0.827
## R-squared test score: 0.826
## R-squared test score: 0.816
## R-squared test score: 0.815
## R-squared test score: 0.809
## Finished plotting and saving

Conclusion

In this initial post I cover the regression models when the output is continous. I intend to touch upon other Machine Learning algorithms.
Comments, suggestions and corrections are welcome.

Watch this this space!

To be continued….

You may like
1. Using Linear Programming (LP) for optimizing bowling change or batting lineup in T20 cricket
2. Neural Networks: The mechanics of backpropagation
3. More book, more cricket! 2nd edition of my books now on Amazon
4. Spicing up a IBM Bluemix cloud app with MongoDB and NodeExpress
5. Introducing cricket package yorkr:Part 4-In the block hole!

To see all posts see Index of posts

Analysis of International T20 matches with yorkr templates


Introduction

In this post I create yorkr templates for International T20 matches that are available on Cricsheet. With these templates you can convert all T20 data which is in yaml format to R dataframes. Further I create data and the necessary templates for analyzing. All of these templates can be accessed from Github at yorkrT20Template. The templates are

  1. Template for conversion and setup – T20Template.Rmd
  2. Any T20 match – T20Matchtemplate.Rmd
  3. T20 matches between 2 nations – T20Matches2TeamTemplate.Rmd
  4. A T20 nations performance against all other T20 nations – T20AllMatchesAllOppnTemplate.Rmd
  5. Analysis of T20 batsmen and bowlers of all T20 nations – T20BatsmanBowlerTemplate.Rmd

Besides the templates the repository also includes the converted data for all T20 matches I downloaded from Cricsheet in Dec 2016, You can recreate the files as more matches are added to Cricsheet site. This post contains all the steps needed for T20 analysis, as more matches are played around the World and more data is added to Cricsheet. This will also be my reference in future if I decide to analyze T20 in future!

Feel free to download/clone these templates  from Github yorkrT20Template and perform your own analysis

Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today!

There will be 5 folders at the root

  1. T20data – Match files as yaml from Cricsheet
  2. T20Matches – Yaml match files converted to dataframes
  3. T20MatchesBetween2Teams – All Matches between any 2 T20 teams
  4. allMatchesAllOpposition – A T20 countries match data against all other teams
  5. BattingBowlingDetails – Batting and bowling details of all countries
library(yorkr)
library(dplyr)

The first few steps take care of the data setup. This needs to be done before any of the analysis of T20 batsmen, bowlers, any T20 match, matches between any 2 T20 countries or analysis of a teams performance against all other countries

There will be 5 folders at the root

  1. T20data
  2. T20Matches
  3. T20MatchesBetween2Teams
  4. allMatchesAllOpposition
  5. BattingBowlingDetails

The source YAML files will be in T20Data folder

1.Create directory T20Matches

Some files may give conversions errors. You could try to debug the problem or just remove it from the T20data folder. At most 2-4 file will have conversion problems and I usally remove then from the files to be converted.

Also take a look at my Inswinger shiny app which was created after performing the same conversion on the Dec 16 data .

convertAllYaml2RDataframesT20("T20Data","T20Matches")

2.Save all matches between all combinations of T20 nations

This function will create the set of all matches between every T20 country against every other T20 country. This uses the data that was created in T20Matches, with the convertAllYaml2RDataframesT20() function.

setwd("./T20MatchesBetween2Teams")
saveAllMatchesBetweenTeams("../T20Matches")

3.Save all matches against all opposition

This will create a consolidated dataframe of all matches played by every T20 playing nation against all other nattions. This also uses the data that was created in T20Matches, with the convertAllYaml2RDataframesT20() function.

setwd("../allMatchesAllOpposition")
saveAllMatchesAllOpposition("../T20Matches")

4. Create batting and bowling details for each T20 country

These are the current T20 playing nations. You can add to this vector as more countries start playing T20. You will get to know all T20 nations by also look at the directory created above namely allMatchesAllOpposition. his also uses the data that was created in T20Matches, with the convertAllYaml2RDataframesT20() function.

setwd("../BattingBowlingDetails")
teams <-c("Australia","India","Pakistan","West Indies", 'Sri Lanka',
          "England", "Bangladesh","Netherlands","Scotland", "Afghanistan",
          "Zimbabwe","Ireland","New Zealand","South Africa","Canada",
          "Bermuda","Kenya","Hong Kong","Nepal","Oman","Papua New Guinea",
          "United Arab Emirates")

for(i in seq_along(teams)){
    print(teams[i])
    val <- paste(teams[i],"-details",sep="")
    val <- getTeamBattingDetails(teams[i],dir="../T20Matches", save=TRUE)

}

for(i in seq_along(teams)){
    print(teams[i])
    val <- paste(teams[i],"-details",sep="")
    val <- getTeamBowlingDetails(teams[i],dir="../T20Matches", save=TRUE)

}

5. Get the list of batsmen for a particular country

For e.g. if you wanted to get the batsmen of Canada you would do the following. By replacing Canada for any other country you can get the batsmen of that country. These batsmen names can then be used in the batsmen analysis

country="Canada"
teamData <- paste(country,"-BattingDetails.RData",sep="")
load(teamData)
countryDF <- battingDetails
bmen <- countryDF %>% distinct(batsman) 
bmen <- as.character(bmen$batsman)
batsmen <- sort(bmen)
batsmen

6. Get the list of bowlers for a particular country

The method below can get the list of bowler names for any T20 nation. These names can then be used in the bowler analysis below

country="Netherlands"
teamData <- paste(country,"-BowlingDetails.RData",sep="")
load(teamData)
countryDF <- bowlingDetails
bwlr <- countryDF %>% distinct(bowler) 
bwlr <- as.character(bwlr$bowler)
bowler <- sort(bwlr)
bowler

Now we are all set

A)  International T20 Match Analysis

Load any match data from the ./T20Matches folder for e.g. Afganistan-England-2016-03-23.RData

setwd("./T20Matches")
load("Afghanistan-England-2016-03-23.RData")
afg_eng<- overs
#The steps are
load("Country1-Country2-Date.Rdata")
country1_country2 <- overs

All analysis for this match can be done now

2. Scorecard

teamBattingScorecardMatch(country1_country2,"Country1")
teamBattingScorecardMatch(country1_country2,"Country2")

3.Batting Partnerships

teamBatsmenPartnershipMatch(country1_country2,"Country1","Country2")
teamBatsmenPartnershipMatch(country1_country2,"Country2","Country1")

4. Batsmen vs Bowler Plot

teamBatsmenVsBowlersMatch(country1_country2,"Country1","Country2",plot=TRUE)
teamBatsmenVsBowlersMatch(country1_country2,"Country1","Country2",plot=FALSE)

5. Team bowling scorecard

teamBowlingScorecardMatch(country1_country2,"Country1")
teamBowlingScorecardMatch(country1_country2,"Country2")

6. Team bowling Wicket kind match

teamBowlingWicketKindMatch(country1_country2,"Country1","Country2")
m <-teamBowlingWicketKindMatch(country1_country2,"Country1","Country2",plot=FALSE)
m

7. Team Bowling Wicket Runs Match

teamBowlingWicketRunsMatch(country1_country2,"Country1","Country2")
m <-teamBowlingWicketRunsMatch(country1_country2,"Country1","Country2",plot=FALSE)
m

8. Team Bowling Wicket Match

m <-teamBowlingWicketMatch(country1_country2,"Country1","Country2",plot=FALSE)
m
teamBowlingWicketMatch(country1_country2,"Country1","Country2")

9. Team Bowler vs Batsmen

teamBowlersVsBatsmenMatch(country1_country2,"Country1","Country2")
m <- teamBowlersVsBatsmenMatch(country1_country2,"Country1","Country2",plot=FALSE)
m

10. Match Worm chart

matchWormGraph(country1_country2,"Country1","Country2")

B)  International T20 Matches between 2 teams

Load match data between any 2 teams from ./T20MatchesBetween2Teams for e.g.Australia-India-allMatches

setwd("./T20MatchesBetween2Teams")
load("Australia-India-allMatches.RData")
aus_ind_matches <- matches
#Replace below with your own countries
country1<-"England"
country2 <- "South Africa"
country1VsCountry2 <- paste(country1,"-",country2,"-allMatches.RData",sep="")
load(country1VsCountry2)
country1_country2_matches <- matches

2.Batsmen partnerships

m<- teamBatsmenPartnershiOppnAllMatches(country1_country2_matches,"country1",report="summary")
m
m<- teamBatsmenPartnershiOppnAllMatches(country1_country2_matches,"country2",report="summary")
m
m<- teamBatsmenPartnershiOppnAllMatches(country1_country2_matches,"country1",report="detailed")
m
teamBatsmenPartnershipOppnAllMatchesChart(country1_country2_matches,"country1","country2")

3. Team batsmen vs bowlers

teamBatsmenVsBowlersOppnAllMatches(country1_country2_matches,"country1","country2")

4. Bowling scorecard

a <-teamBattingScorecardOppnAllMatches(country1_country2_matches,main="country1",opposition="country2")
a

5. Team bowling performance

teamBowlingPerfOppnAllMatches(country1_country2_matches,main="country1",opposition="country2")

6. Team bowler wickets

teamBowlersWicketsOppnAllMatches(country1_country2_matches,main="country1",opposition="country2")
m <-teamBowlersWicketsOppnAllMatches(country1_country2_matches,main="country1",opposition="country2",plot=FALSE)
teamBowlersWicketsOppnAllMatches(country1_country2_matches,"country1","country2",top=3)
m

7. Team bowler vs batsmen

teamBowlersVsBatsmenOppnAllMatches(country1_country2_matches,"country1","country2",top=5)

8. Team bowler wicket kind

teamBowlersWicketKindOppnAllMatches(country1_country2_matches,"country1","country2",plot=TRUE)
m <- teamBowlersWicketKindOppnAllMatches(country1_country2_matches,"country1","country2",plot=FALSE)
m[1:30,]

9. Team bowler wicket runs

teamBowlersWicketRunsOppnAllMatches(country1_country2_matches,"country1","country2")

10. Plot wins and losses

setwd("./T20Matches")
plotWinLossBetweenTeams("country1","country2")

C)  International T20 Matches for a team against all other teams

Load the data between for a T20 team against all other countries ./allMatchesAllOpposition for e.g all matches of India

load("allMatchesAllOpposition-India.RData")
india_matches <- matches
country="country1"
allMatches <- paste("allMatchesAllOposition-",country,".RData",sep="")
load(allMatches)
country1AllMatches <- matches

2. Team’s batting scorecard all Matches

m <-teamBattingScorecardAllOppnAllMatches(country1AllMatches,theTeam="country1")
m

3. Batting scorecard of opposing team

m <-teamBattingScorecardAllOppnAllMatches(matches=country1AllMatches,theTeam="country2")

4. Team batting partnerships

m <- teamBatsmenPartnershipAllOppnAllMatches(country1AllMatches,theTeam="country1")
m
m <- teamBatsmenPartnershipAllOppnAllMatches(country1AllMatches,theTeam='country1',report="detailed")
head(m,30)
m <- teamBatsmenPartnershipAllOppnAllMatches(country1AllMatches,theTeam='country1',report="summary")
m

5. Team batting partnerships plot

teamBatsmenPartnershipAllOppnAllMatchesPlot(country1AllMatches,"country1",main="country1")
teamBatsmenPartnershipAllOppnAllMatchesPlot(country1AllMatches,"country1",main="country2")

6, Team batsmen vs bowlers report

m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(country1AllMatches,"country1",rank=0)
m
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(country1AllMatches,"country1",rank=1,dispRows=30)
m
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(matches=country1AllMatches,theTeam="country2",rank=1,dispRows=25)
m

7. Team batsmen vs bowler plot

d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(country1AllMatches,"country1",rank=1,dispRows=50)
d
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(country1AllMatches,"country1",rank=2,dispRows=50)
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

8. Team bowling scorecard

teamBowlingScorecardAllOppnAllMatchesMain(matches=country1AllMatches,theTeam="country1")
teamBowlingScorecardAllOppnAllMatches(country1AllMatches,'country2')

9. Team bowler vs batsmen

teamBowlersVsBatsmenAllOppnAllMatchesMain(country1AllMatches,theTeam="country1",rank=0)
teamBowlersVsBatsmenAllOppnAllMatchesMain(country1AllMatches,theTeam="country1",rank=2)
teamBowlersVsBatsmenAllOppnAllMatchesRept(matches=country1AllMatches,theTeam="country1",rank=0)

10. Team Bowler vs bastmen

df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(country1AllMatches,theTeam="country1",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"country1","country1")

11. Team bowler wicket kind

teamBowlingWicketKindAllOppnAllMatches(country1AllMatches,t1="country1",t2="All")
teamBowlingWicketKindAllOppnAllMatches(country1AllMatches,t1="country1",t2="country2")

12.

teamBowlingWicketRunsAllOppnAllMatches(country1AllMatches,t1="country1",t2="All",plot=TRUE)
teamBowlingWicketRunsAllOppnAllMatches(country1AllMatches,t1="country1",t2="country2",plot=TRUE)

D) Batsman functions

Get the batsman’s details for a batsman

setwd("../BattingBowlingDetails")
kohli <- getBatsmanDetails(team="India",name="Kohli",dir=".")
batsmanDF <- getBatsmanDetails(team="country1",name="batsmanName",dir=".")

2. Runs vs deliveries

batsmanRunsVsDeliveries(batsmanDF,"batsmanName")

3. Batsman 4s & 6s

batsman46 <- select(batsmanDF,batsman,ballsPlayed,fours,sixes,runs)
p1 <- batsmanFoursSixes(batsman46,"batsmanName")

4. Batsman dismissals

batsmanDismissals(batsmanDF,"batsmanName")

5. Runs vs Strike rate

batsmanRunsVsStrikeRate(batsmanDF,"batsmanName")

6. Batsman Moving Average

batsmanMovingAverage(batsmanDF,"batsmanName")

7. Batsman cumulative average

batsmanCumulativeAverageRuns(batsmanDF,"batsmanName")

8. Batsman cumulative strike rate

batsmanCumulativeStrikeRate(batsmanDF,"batsmanName")

9. Batsman runs against oppositions

batsmanRunsAgainstOpposition(batsmanDF,"batsmanName")

10. Batsman runs vs venue

batsmanRunsVenue(batsmanDF,"batsmanName")

11. Batsman runs predict

batsmanRunsPredict(batsmanDF,"batsmanName")

12. Bowler functions

For example to get Ravicahnder Ashwin’s bowling details

setwd("../BattingBowlingDetails")
ashwin <- getBowlerWicketDetails(team="India",name="Ashwin",dir=".")
bowlerDF <- getBatsmanDetails(team="country1",name="bowlerName",dir=".")

13. Bowler Mean Economy rate

bowlerMeanEconomyRate(bowlerDF,"bowlerName")

14. Bowler mean runs conceded

bowlerMeanRunsConceded(bowlerDF,"bowlerName")

15. Bowler Moving Average

bowlerMovingAverage(bowlerDF,"bowlerName")

16. Bowler cumulative average wickets

bowlerCumulativeAvgWickets(bowlerDF,"bowlerName")

17. Bowler cumulative Economy Rate (ER)

bowlerCumulativeAvgEconRate(bowlerDF,"bowlerName")

18. Bowler wicket plot

bowlerWicketPlot(bowlerDF,"bowlerName")

19. Bowler wicket against opposition

bowlerWicketsAgainstOpposition(bowlerDF,"bowlerName")

20. Bowler wicket at cricket grounds

bowlerWicketsVenue(bowlerDF,"bowlerName")

21. Predict number of deliveries to wickets

setwd("./T20Matches")
bowlerDF1 <- getDeliveryWickets(team="country1",dir=".",name="bowlerName",save=FALSE)
bowlerWktsPredict(bowlerDF1,"bowlerName")

GooglyPlus: yorkr analyzes IPL players, teams, matches with plots and tables


In this post I introduce my new Shiny app,“GooglyPlus”, which is a  more evolved version of my earlier Shiny app “Googly”. My R package ‘yorkr’,  on which both these Shiny apps are based, has the ability to output either a dataframe or plot, depending on a parameter plot=TRUE or FALSE. My initial version of the app only included plots, and did not exercise the yorkr package fully. Moreover, I am certain, there may be a set of cricket aficionados who would prefer, numbers to charts. Hence I have created this enhanced version of the Googly app and appropriately renamed it as GooglyPlus. GooglyPlus is based on the yorkr package which uses data from Cricsheet. The app is based on IPL data from  all IPL matches from 2008 up to 2016. Feel free to clone/fork or download the code from Github at GooglyPlus.

Click  GooglyPlus to access the Shiny app!

Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today!

The changes for GooglyPlus over the earlier Googly app is only in the following 3 tab panels

  • IPL match
  • Head to head
  • Overall Performance

The analysis of IPL batsman and IPL bowler tabs are unchanged. These charts are as they were before.

The changes are only in  tabs i) IPL match ii) Head to head and  iii) Overall Performance. New functionality has been added and existing functions now have the dual option of either displaying a plot or a table.

The changes are

A) IPL Match
The following additions/enhancements have been done

-Match Batting Scorecard – Table
-Batting Partnerships – Plot, Table (New)
-Batsmen vs Bowlers – Plot, Table(New)
-Match Bowling Scorecard   – Table (New)
-Bowling Wicket Kind – Plot, Table (New)
-Bowling Wicket Runs – Plot, Table (New)
-Bowling Wicket Match – Plot, Table (New)
-Bowler vs Batsmen – Plot, Table (New)
-Match Worm Graph – Plot

B) Head to head
The following functions have been added/enhanced

-Team Batsmen Batting Partnerships All Matches – Plot, Table {Summary (New) and Detailed (New)}
-Team Batting Scorecard All Matches – Table (New)
-Team Batsmen vs Bowlers all Matches – Plot, Table (New)
-Team Wickets Opposition All Matches – Plot, Table (New)
-Team Bowling Scorecard All Matches – Table (New)
-Team Bowler vs Batsmen All Matches – Plot, Table (New)
-Team Bowlers Wicket Kind All Matches – Plot, Table (New)
-Team Bowler Wicket Runs All Matches – Plot, Table (New)
-Win Loss All Matches – Plot

C) Overall Performance
The following additions/enhancements have been done in this tab

-Team Batsmen Partnerships Overall – Plot, Table {Summary (New) and Detailed (New)}
-Team Batting Scorecard Overall –Table (New)
-Team Batsmen vs Bowlers Overall – Plot, Table (New)
-Team Bowler vs Batsmen Overall – Plot, Table (New)
-Team Bowling Scorecard Overall – Table (New)
-Team Bowler Wicket Kind Overall – Plot, Table (New)

Included below are some random charts and tables. Feel free to explore the Shiny app further

1) IPL Match
a) Match Batting Scorecard (Table only)
This is the batting score card for the Chennai Super Kings & Deccan Chargers 2011-05-11

untitled

b)  Match batting partnerships (Plot)
Delhi Daredevils vs Kings XI Punjab – 2011-04-23

untitled

c) Match batting partnerships (Table)
The same batting partnership  Delhi Daredevils vs Kings XI Punjab – 2011-04-23 as a table

untitled

d) Batsmen vs Bowlers (Plot)
Kolkata Knight Riders vs Mumbai Indians 2010-04-19

Untitled.png

e)  Match Bowling Scorecard (Table only)
untitled

B) Head to head

a) Team Batsmen Partnership (Plot)
Deccan Chargers vs Kolkata Knight Riders all matches

untitled

b)  Team Batsmen Partnership (Summary – Table)
In the following tables it can be seen that MS Dhoni has performed better that SK Raina  CSK against DD matches, whereas SK Raina performs better than Dhoni in CSK vs  KKR matches

i) Chennai Super Kings vs Delhi Daredevils (Summary – Table)

untitled

ii) Chennai Super Kings vs Kolkata Knight Riders (Summary – Table)
untitled

iii) Rising Pune Supergiants vs Gujarat Lions (Detailed – Table)
This table provides the detailed partnership for RPS vs GL all matches

untitled

c) Team Bowling Scorecard (Table only)
This table gives the bowling scorecard of Pune Warriors vs Deccan Chargers in all matches

untitled

C) Overall performances
a) Batting Scorecard All Matches  (Table only)

This is the batting scorecard of Royal Challengers Bangalore. The top 3 batsmen are V Kohli, C Gayle and AB Devilliers in that order

untitled

b) Batsman vs Bowlers all Matches (Plot)
This gives the performance of Mumbai Indian’s batsman of Rank=1, which is Rohit Sharma, against bowlers of all other teams

untitled

c)  Batsman vs Bowlers all Matches (Table)
The above plot as a table. It can be seen that Rohit Sharma has scored maximum runs against M Morkel, then Shakib Al Hasan and then UT Yadav.

untitled

d) Bowling scorecard (Table only)
The table below gives the bowling scorecard of CSK. R Ashwin leads with a tally of 98 wickets followed by DJ Bravo who has 88 wickets and then JA Morkel who has 83 wickets in all matches against all teams

Untitled.png

This is just a random selection of functions. Do play around with the app and checkout how the different IPL batsmen, bowlers and teams stack against each other. Do read my earlier post Googly: An interactive app for analyzing IPL players, matches and teams using R package yorkr  for more details about the app and other functions available.

Click GooglyPlus to access the Shiny app!

You can clone/fork/download the code from Github at GooglyPlus

Hope you have fun playing around with the Shiny app!

Note: In the tabs, for some of the functions, not all controls  are required. It is possible to enable the controls selectively but this has not been done in this current version. I may make the changes some time in the future.

Take a look at my other Shiny apps
a.Revisiting crimes against women in India
b. Natural language processing: What would Shakespeare say?

Check out some of my other posts
1. Analyzing World Bank data with WDI, googleVis Motion Charts
2. Video presentation on Machine Learning, Data Science, NLP and Big Data – Part 1
3. Singularity
4. Design principles of scalable, distributed systems
5. Simulating an Edge shape in Android
6. Dabbling with Wiener filter in OpenCV

To see all posts click Index of Posts

yorkr ranks IPL Players post 2016 season


Here is a short post which ranks IPL batsmen and bowlers post the 2016 IPL season. These are based on match data from Cricsheet. I had already ranked IPL players in my post yorkr ranks IPL batsmen and bowlers, but that was mid IPL 2016 season. This post will be final ranking post 2016 season

This post has also been published in RPubs RankIPLPlayers2016. You can download this as a pdf file at RankIPLPlayers2016.pdf.

You can take a look at the code at rankIPLPlayers2016

Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today!

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

rm(list=ls())
library(yorkr)
library(dplyr)
source('C:/software/cricket-package/cricsheet/ipl2016/final/R/rankIPLBatsmen.R', encoding = 'UTF-8')
source('C:/software/cricket-package/cricsheet/ipl2016/final/R/rankIPLBowlers.R', encoding = 'UTF-8')

Rank IPL batsmen post 2016

Chris Gayle, Shaun Marsh & David Warner are top 3 IPL batsmen. Gayle towers over everybody, with an 38.28 Mean Runs, and a Mean Strike Rate of 138.85. Virat Kohli comes in 4th, with 34.52 as his Average Runs per innings, and a Mean Strike Rate of 117.51

iplBatsmanRank <- rankIPLBatsmen()
as.data.frame(iplBatsmanRank[1:30,])
##             batsman matches meanRuns    meanSR
## 1          CH Gayle      92 38.28261 138.85120
## 2          SE Marsh      60 36.40000 118.97783
## 3         DA Warner     104 34.51923 124.88798
## 4           V Kohli     136 31.77941 117.51000
## 5         AM Rahane      89 31.46067 104.62989
## 6    AB de Villiers     109 29.93578 136.48945
## 7      SR Tendulkar      78 29.62821 108.58962
## 8         G Gambhir     133 28.94737 109.61263
## 9         RG Sharma     140 28.68571 117.79057
## 10         SK Raina     143 28.41259 121.55713
## 11        SR Watson      90 28.21111 125.80122
## 12         S Dhawan     110 28.09091 111.97282
## 13         R Dravid      79 27.87342 109.14544
## 14         DR Smith      76 27.55263 120.22329
## 15        JP Duminy      70 27.28571 122.99243
## 16      BB McCullum      94 26.86170 118.55606
## 17        JH Kallis      97 26.83505  95.47866
## 18         V Sehwag     105 26.26667 137.11562
## 19       RV Uthappa     132 26.18182 123.16326
## 20     AC Gilchrist      81 25.77778 122.69074
## 21          M Vijay      99 25.69697 106.02010
## 22    KC Sangakkara      70 25.67143 112.97529
## 23         MS Dhoni     131 25.14504 131.62206
## 24        DA Miller      60 24.76667 133.80983
## 25        AT Rayudu      99 23.35354 121.59313
## 26 DPMD Jayawardene      80 23.05000 114.54712
## 27     Yuvraj Singh     103 22.46602 118.15000
## 28        DJ Hussey      63 22.26984        NA
## 29        YK Pathan     121 22.25620 132.58793
## 30      S Badrinath      66 22.22727 114.97061

Rank IPL bowlers

The top 3 IPL T20 bowlers are SL Malinga, DJ Bravo and SP Narine

Don’t get hung up on the decimals in the average wickets for the bowlers. All it implies is that if 2 bowlers have average wickets of 1.0 and 1.5, it implies that in 2 matches the 1st bowler will take 2 wickets and the 2nd bowler will take 3 wickets.

setwd("C:/software/cricket-package/cricsheet/ipl2016/details")
iplBowlersRank <- rankIPLBowlers()
as.data.frame(iplBowlersRank[1:30,])
##             bowler matches meanWickets   meanER
## 1       SL Malinga      96    1.645833 6.545208
## 2         DJ Bravo      58    1.517241 7.929310
## 3        SP Narine      65    1.492308 6.155077
## 4          B Kumar      45    1.422222 7.355556
## 5        YS Chahal      41    1.414634 8.057073
## 6         M Morkel      37    1.405405 7.626216
## 7        IK Pathan      40    1.400000 7.579250
## 8         RP Singh      42    1.357143 7.966429
## 9         MM Patel      31    1.354839 7.282581
## 10   R Vinay Kumar      63    1.317460 8.342540
## 11  Sandeep Sharma      38    1.315789 7.697368
## 12       MM Sharma      46    1.304348 7.740652
## 13         P Awana      33    1.303030 8.325758
## 14        MM Patel      30    1.300000 7.569667
## 15          Z Khan      41    1.292683 7.735854
## 16         PP Ojha      53    1.245283 7.268679
## 17     JP Faulkner      40    1.225000 8.502250
## 18 Shakib Al Hasan      41    1.170732 7.103659
## 19     DS Kulkarni      32    1.156250 8.372188
## 20        UT Yadav      46    1.152174 8.394783
## 21        A Kumble      41    1.146341 6.567073
## 22       JA Morkel      73    1.136986 8.131370
## 23        SK Warne      53    1.132075 7.277170
## 24        A Mishra      55    1.127273 7.319455
## 25        UT Yadav      33    1.090909 8.853636
## 26        L Balaji      34    1.088235 7.186176
## 27       PP Chawla      35    1.085714 8.162000
## 28        R Ashwin      92    1.065217 6.812391
## 29  M Muralitharan      39    1.051282 6.470256
## 30 Harbhajan Singh     120    1.050000 7.134833

yorkr ranks ODI batsmen and bowlers


This is the last and final post in which yorkr ranks ODI batsmen and bowlers. These are based on match data from Cricsheet. The ranking is done on

  1. average runs and average strike rate for batsmen and
  2. average wickets and average economy rate for bowlers.

This post has also been published in RPubs RankODIPlayers. You can download this as a pdf file at RankODIPlayers.pdf.

Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today!

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

You can take a look at the code at rankODIPlayers (available in yorkr_0.0.5)

rm(list=ls())
library(yorkr)
library(dplyr)
source("rankODIBatsmen.R")
source("rankODIBowlers.R")

Rank ODI batsmen

The top 3 ODI batsmen are hashim Amla (SA), Matther Hayden(Aus) & Virat Kohli (Ind) . Note: For ODI a a cutoff of at least 50 matches played was chosen.

ODIBatsmanRank <- rankODIBatsmen()
as.data.frame(ODIBatsmanRank[1:30,])
##            batsman matches meanRuns    meanSR
## 1          HM Amla     185 51.96216  84.15508
## 2        ML Hayden      79 50.08861  81.20646
## 3          V Kohli     279 48.51971  78.55197
## 4   AB de Villiers     253 47.93676  95.05561
## 5     SR Tendulkar     151 45.82119  79.62311
## 6         S Dhawan     116 45.03448  81.54043
## 7         V Sehwag     167 44.49102 106.27563
## 8          JE Root     111 43.64865  81.66054
## 9        Q de Kock      85 43.61176  82.55235
## 10       IJL Trott     113 43.36283  70.69761
## 11   KC Sangakkara     293 42.81911  75.10420
## 12      TM Dilshan     283 41.76678  89.70360
## 13   KS Williamson     146 41.24658  73.49267
## 14   S Chanderpaul      93 40.07527  70.59613
## 15        HH Gibbs      75 40.00000  79.03813
## 16     Salman Butt      57 39.85965  59.29807
## 17    Anamul Haque      58 39.72414  56.45224
## 18      RT Ponting     238 38.88235  71.94294
## 19       JH Kallis     136 38.77941  67.17794
## 20        MS Dhoni     328 38.57927  90.30555
## 21      MJ Guptill     199 38.54774  73.88090
## 22       DA Warner     138 38.52174  87.24978
## 23 Mohammad Yousuf      94 38.44681  72.69851
## 24        JD Ryder      66 38.40909  91.29667
## 25       GJ Bailey     133 38.38346  75.74519
## 26       G Gambhir     209 37.83254  75.15483
## 27      AJ Strauss     122 37.80328  71.54844
## 28       MJ Clarke     301 37.67442  69.78415
## 29       SR Watson     274 37.08029  83.46489
## 30        AJ Finch     103 36.36893  79.49845

Rank ODI bowlers

The top 3 ODI bowlers are R J Harris (Aus), MJ Henry(NZ) and MA Starc(Aus). Mohammed Shami is 4th and Amit Mishra is 8th A cutoff of 20 matches was considered for bowlers

ODIBowlersRank <- rankODIBowlers()
## [1] 35072     3
## [1] "C:/software/cricket-package/york-test/yorkrData/ODI/ODI-matches"
as.data.frame(ODIBowlersRank[1:30,])
##               bowler matches meanWickets   meanER
## 1  Mustafizur Rahman      56    4.000000 4.293214
## 2           JH Davey      53    3.528302 4.455094
## 3          RJ Harris      94    3.276596 4.361489
## 4           MA Starc     208    3.144231 4.425865
## 5           MJ Henry      88    3.125000 4.961250
## 6         A Flintoff     139    2.956835 4.283022
## 7           A Mishra     106    2.886792 4.365849
## 8     Mohammed Shami     144    2.777778 5.609306
## 9     MJ McClenaghan     165    2.751515 5.640424
## 10          CJ McKay     230    2.704348       NA
## 11       MF Maharoof     114    2.701754 4.427018
## 12       Imran Tahir     156    2.660256 4.461923
## 13        BAW Mendis     234    2.641026 4.532308
## 14     RK Kleinveldt      54    2.629630 4.306667
## 15      Arafat Sunny      62    2.612903 4.103226
## 16         JE Taylor     156    2.602564 5.115192
## 17           AJ Hall      55    2.600000 3.879091
## 18        WD Parnell     129    2.596899 5.477597
## 19         CR Woakes     129    2.596899 5.340620
## 20      DE Bollinger     152    2.592105 4.282763
## 21        Wahab Riaz     206    2.567961 5.431748
## 22        PJ Cummins     148    2.567568 5.715405
## 23         R Rampaul     173    2.549133 4.726590
## 24      Taskin Ahmed      56    2.535714 5.325357
## 25          DW Steyn     292    2.534247 4.534007
## 26      JR Hazlewood      64    2.531250 4.392500
## 27        Abdur Rauf      84    2.523810 4.786667
## 28           SW Tait     141    2.517730 5.173191
## 29      Hamid Hassan     106    2.509434 4.686038
## 30        SL Malinga     419    2.498807 4.968974

Hope you have fun with my yorkr package.!

yorkr crashes the IPL party! – Part 2


Most people say that it is the intellect which makes a great scientist. They are wrong: it is character.

                 Albert Einstein

*Science is organized knowledge. Wisdom is organized life.“*

                 Immanuel Kant

If I have seen further, it is by standing on the shoulders of giants

                 Isaac Newton
                 

Valid criticism does you a favor.

                 Carl Sagan

Introduction

In this post, my R package ‘yorkr’, continues to bat in the IPL Twenty20s. This post is a continuation of my earlier post – yorkr crashes the IPL party ! – Part 1. This post deals with Class 2 functions namely the performances of an IPL team in all T20 matches against another IPL team for e.g all T20 matches of Chennai Super Kings vs Royal Challengers Bangalore or Kochi Tuskers Kerala vs Mumbai Indians etc.

You can clone/fork the code for my package yorkr from Github at yorkr

This post has also been published at RPubs IPLT20-Part2 and can also be downloaded as a PDF document from IPLT20-Part2.pdf

Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today!

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

The list of function in Class 2 are

  1. teamBatsmenPartnershiOppnAllMatches()
  2. teamBatsmenPartnershipOppnAllMatchesChart()
  3. teamBatsmenVsBowlersOppnAllMatches()
  4. teamBattingScorecardOppnAllMatches()
  5. teamBowlingPerfOppnAllMatches()
  6. teamBowlersWicketsOppnAllMatches()
  7. teamBowlersVsBatsmenOppnAllMatches()
  8. teamBowlersWicketKindOppnAllMatches()
  9. teamBowlersWicketRunsOppnAllMatches()
  10. plotWinLossBetweenTeams()

1. Install the package from CRAN

library(yorkr)
rm(list=ls())

2. Get data for all T20 matches between 2 teams

We can get all IPL T20 matches between any 2 teams using the function below. The dir parameter should point to the folder which has the IPL T20 RData files of the individual matches. This function creates a data frame of all the IPL T20 matches and also saves the dataframe as RData. The function below gets all matches between India and Australia

setwd("C:/software/cricket-package/york-test/yorkrData/IPL/IPL-T20-matches")
matches <- getAllMatchesBetweenTeams("Sunrisers Hyderabad","Royal Challengers Bangalore",dir=".")
dim(matches)
## [1] 1320   25

I have however already saved the IPL Twenty20 matches for all possible combinations of opposing IPL Teams. The data for these matches for the individual teams/countries can be obtained from Github at in the folder IPL-T20-allmatches-between-two-teams

Note: You will need to use the function below for future matches! The data in Cricsheet are from 2008 -2015

3. Save data for all matches between all combination of 2 teams

This can be done locally using the function below. You could use this function to combine all IPL Twenty20 matches between any 2 IPL teams into a single dataframe and save it in the current folder. The current implementation expects that the the RData files of individual matches are in ../data folder. Since I already have converted this I will not be running this again

# Available in yorkr_0.0.5. Can be installed from Github though!
#saveAllMatchesBetween2IPLTeams()

4. Load data directly for all matches between 2 IPL teams

As in my earlier post I pick all IPL Twenty20 matches between 2 random IPL teams. I load the data directly from the stored RData files. When we load the Rdata file a “matches” object will be created. This object can be stored for the apporpriate teams as below

# Load T20 matches between 2 IPL teams
setwd("C:/software/cricket-package/york-test/yorkrData/IPL/IPL-T20-allmatches-between-two-teams")
load("Chennai Super Kings-Delhi Daredevils-allMatches.RData")
csk_dd_matches <- matches
load("Deccan Chargers-Kolkata Knight Riders-allMatches.RData")
dc_kkr_matches <- matches
load("Mumbai Indians-Pune Warriors-allMatches.RData")
mi_pw_matches <- matches
load("Rajasthan Royals-Sunrisers Hyderabad-allMatches.RData")
rr_sh_matches <- matches
load("Kings XI Punjab-Royal Challengers Bangalore-allMatches.RData")
kxip_rcb_matches <-matches
load("Chennai Super Kings-Kochi Tuskers Kerala-allMatches.RData")
csk_ktk_matches <-matches

5. Team Batsmen partnership in Twenty20 (all matches with opposing IPL team)

This function will create a report of the batting partnerships in the IPL teams for the matches between the teams. The report can be brief or detailed depending on the parameter ‘report’. As can be seen M S Dhoni tops the list for CSK, followed by Raina and then Murali Vijay for matches against Delhi Daredevils. For the Delhi Daredevils it is V Sehawag followed by Gambhir.

m<- teamBatsmenPartnershiOppnAllMatches(csk_dd_matches,'Chennai Super Kings',report="summary")
m
## Source: local data frame [29 x 2]
## 
##         batsman totalRuns
##          (fctr)     (dbl)
## 1      MS Dhoni       364
## 2      SK Raina       335
## 3       M Vijay       290
## 4   S Badrinath       185
## 5     ML Hayden       181
## 6    MEK Hussey       169
## 7  F du Plessis       100
## 8      S Vidyut        94
## 9      DR Smith        81
## 10    JA Morkel        80
## ..          ...       ...
m<- teamBatsmenPartnershiOppnAllMatches(csk_dd_matches,'Delhi Daredevils',report="summary")
m
## Source: local data frame [53 x 2]
## 
##             batsman totalRuns
##              (fctr)     (dbl)
## 1          V Sehwag       233
## 2         G Gambhir       200
## 3         DA Warner       134
## 4    AB de Villiers       133
## 5        KD Karthik       129
## 6  DPMD Jayawardene        89
## 7         JA Morkel        81
## 8        TM Dilshan        79
## 9          S Dhawan        78
## 10          SS Iyer        77
## ..              ...       ...
m <-teamBatsmenPartnershiOppnAllMatches(dc_kkr_matches,'Deccan Chargers',report="summary")
m
## Source: local data frame [29 x 2]
## 
##            batsman totalRuns
##             (fctr)     (dbl)
## 1     AC Gilchrist       166
## 2         HH Gibbs       145
## 3        RG Sharma       116
## 4         S Dhawan       111
## 5        A Symonds       100
## 6  Y Venugopal Rao        92
## 7         B Chipli        60
## 8     DB Ravi Teja        54
## 9         TL Suman        53
## 10      VVS Laxman        32
## ..             ...       ...
m <-teamBatsmenPartnershiOppnAllMatches(mi_pw_matches,'Mumbai Indians',report="detailed")
m[1:30,]
##         batsman   nonStriker partnershipRuns totalRuns
## 1  SR Tendulkar JEC Franklin              24       152
## 2  SR Tendulkar    AT Rayudu              46       152
## 3  SR Tendulkar    RG Sharma               2       152
## 4  SR Tendulkar   KD Karthik              20       152
## 5  SR Tendulkar   RT Ponting              39       152
## 6  SR Tendulkar  AC Blizzard              12       152
## 7  SR Tendulkar  RJ Peterson               9       152
## 8     RG Sharma SR Tendulkar               3       135
## 9     RG Sharma JEC Franklin               0       135
## 10    RG Sharma    AT Rayudu              34       135
## 11    RG Sharma    A Symonds              19       135
## 12    RG Sharma   KD Karthik              19       135
## 13    RG Sharma   KA Pollard              47       135
## 14    RG Sharma     TL Suman               7       135
## 15    RG Sharma   GJ Maxwell               6       135
## 16   KD Karthik SR Tendulkar               8       108
## 17   KD Karthik JEC Franklin              32       108
## 18   KD Karthik    AT Rayudu               3       108
## 19   KD Karthik    RG Sharma              50       108
## 20   KD Karthik   SL Malinga              10       108
## 21   KD Karthik      PP Ojha               0       108
## 22   KD Karthik  RJ Peterson               4       108
## 23   KD Karthik  NLTC Perera               1       108
## 24    AT Rayudu SR Tendulkar              54        92
## 25    AT Rayudu    RG Sharma              37        92
## 26    AT Rayudu   KD Karthik               1        92
## 27 JEC Franklin SR Tendulkar              31        63
## 28 JEC Franklin    RG Sharma               1        63
## 29 JEC Franklin   KD Karthik              15        63
## 30 JEC Franklin     SA Yadav              10        63
m <-teamBatsmenPartnershiOppnAllMatches(rr_sh_matches,'Sunrisers Hyderabad',report="summary")
m
## Source: local data frame [23 x 2]
## 
##         batsman totalRuns
##          (fctr)     (dbl)
## 1      S Dhawan       168
## 2     DJG Sammy        95
## 3    EJG Morgan        90
## 4     DA Warner        83
## 5       NV Ojha        50
## 6      KL Rahul        40
## 7     RS Bopara        40
## 8      DW Steyn        31
## 9      CL White        31
## 10 MC Henriques        29
## ..          ...       ...
m <-teamBatsmenPartnershiOppnAllMatches(kxip_rcb_matches,'Kings XI Punjab',report="summary")
m
## Source: local data frame [47 x 2]
## 
##          batsman totalRuns
##           (fctr)     (dbl)
## 1       SE Marsh       246
## 2      DA Miller       224
## 3      RS Bopara       203
## 4   AC Gilchrist       191
## 5   Yuvraj Singh       126
## 6       MS Bisla       103
## 7  Mandeep Singh       100
## 8      DJ Hussey        99
## 9  Azhar Mahmood        96
## 10 KC Sangakkara        88
## ..           ...       ...
m <-teamBatsmenPartnershiOppnAllMatches(csk_ktk_matches,'Kochi Tuskers Kerala',report="summary")
m
## Source: local data frame [8 x 2]
## 
##            batsman totalRuns
##             (fctr)     (dbl)
## 1      BB McCullum        80
## 2         BJ Hodge        70
## 3         PA Patel        40
## 4        RA Jadeja        35
## 5 Y Gnaneswara Rao        19
## 6 DPMD Jayawardene        16
## 7          OA Shah         3
## 8        KM Jadhav         1

6. Team batsmen partnership in Twenty20 (all matches with opposing IPL team)

This is plotted graphically in the charts below. The partnerships are shown. Note: All functions which create a plot also include a parameter plot=TRUE/FALSE. If you set this as FALSE then a data frame is returned. You can use the dataframe to create an interactive plot for the partnerships (mouse over) using packages like plotly,rcharts, googleVis or ggvis.

teamBatsmenPartnershipOppnAllMatchesChart(csk_dd_matches,'Chennai Super Kings',"Delhi Daredevils")

teamBatsmenPartnership-1

teamBatsmenPartnershipOppnAllMatchesChart(dc_kkr_matches,main="Kolkata Knight Riders",opposition="Deccan Chargers")

teamBatsmenPartnership-2

teamBatsmenPartnershipOppnAllMatchesChart(kxip_rcb_matches,"Royal Challengers Bangalore",opposition="Kings XI Punjab")

teamBatsmenPartnership-3

teamBatsmenPartnershipOppnAllMatchesChart(mi_pw_matches,"Mumbai Indians","Pune Warriors")

teamBatsmenPartnership-4

m <- teamBatsmenPartnershipOppnAllMatchesChart(rr_sh_matches,"Rajasthan Royals","Sunrisers Hyderabad",plot=FALSE)
m[1:30,]
##        batsman  nonStriker runs
## 1    SR Watson   STR Binny   60
## 2    AM Rahane   STR Binny   59
## 3    STR Binny   AM Rahane   45
## 4    SR Watson    R Dravid   42
## 5    AM Rahane   SV Samson   41
## 6     BJ Hodge   SV Samson   36
## 7    CH Morris   STR Binny   34
## 8    AM Rahane   SR Watson   31
## 9     R Dravid   SR Watson   30
## 10   SV Samson   AM Rahane   29
## 11   SR Watson   AM Rahane   27
## 12   SPD Smith    DJ Hooda   25
## 13   SPD Smith JP Faulkner   24
## 14   SPD Smith   STR Binny   20
## 15    R Dravid   AM Rahane   18
## 16    BJ Hodge JP Faulkner   18
## 17 JP Faulkner   SPD Smith   18
## 18   SV Samson     KK Nair   14
## 19 JP Faulkner   STR Binny   14
## 20   SV Samson   STR Binny   13
## 21   SPD Smith   AM Rahane   13
## 22   SR Watson   SPD Smith   12
## 23   STR Binny JP Faulkner   12
## 24   STR Binny   SPD Smith   12
## 25 JP Faulkner   SV Samson   12
## 26     KK Nair   SV Samson   12
## 27 JP Faulkner    BJ Hodge   11
## 28   SPD Smith   SR Watson   10
## 29   STR Binny   SR Watson    9
## 30   SV Samson    BJ Hodge    9

7. Team batsmen versus bowler in Twenty20 (all matches with opposing IPL team)

The plots below provide information on how each of the top batsmen of the IPL teams fared against the opposition bowlers

# Adam Gilchrist was the top performer for Deccan Chargers
teamBatsmenVsBowlersOppnAllMatches(dc_kkr_matches,"Deccan Chargers","Kolkata Knight Riders")

batsmenvsBowler-1

teamBatsmenVsBowlersOppnAllMatches(csk_dd_matches,"Delhi Daredevils","Chennai Super Kings",top=3)

batsmenvsBowler-2

m <- teamBatsmenVsBowlersOppnAllMatches(csk_ktk_matches,"Chennai Super Kings","Kochi Tuskers Kerala",top=10,plot=FALSE)
m
## Source: local data frame [37 x 3]
## Groups: batsman [1]
## 
##     batsman         bowler  runs
##      (fctr)         (fctr) (dbl)
## 1  SK Raina       RP Singh     6
## 2  SK Raina    S Sreesanth    18
## 3  SK Raina M Muralitharan     1
## 4  SK Raina  R Vinay Kumar     4
## 5  SK Raina    NLTC Perera    11
## 6  SK Raina       RR Powar    13
## 7  SK Raina       RV Gomez    16
## 8   WP Saha       RP Singh    15
## 9   WP Saha M Muralitharan    11
## 10  WP Saha       BJ Hodge     1
## ..      ...            ...   ...
teamBatsmenVsBowlersOppnAllMatches(rr_sh_matches,"Sunrisers Hyderabad","Rajasthan Royals")

batsmenvsBowler-3

8. Team batsmen versus bowler in Twenty20(all matches with opposing IPL team)

The following tables gives the overall performances of the IPL team’s batsmen against the opposition.

#Chris Gayle followed by Virat Kohli tops for RCB
a <-teamBattingScorecardOppnAllMatches(kxip_rcb_matches,main="Royal Challengers Bangalore",opposition="Kings XI Punjab")
## Total= 2444
a
## Source: local data frame [55 x 5]
## 
##           batsman ballsPlayed fours sixes  runs
##            (fctr)       (int) (int) (int) (dbl)
## 1        CH Gayle         313    45    41   561
## 2         V Kohli         296    39     8   344
## 3  AB de Villiers         183    23    16   301
## 4       JH Kallis         133    18     7   187
## 5        R Dravid          90    11     1   105
## 6      RV Uthappa          47     7     6    92
## 7       CA Pujara          66    11    NA    70
## 8       MK Pandey          50     5     3    67
## 9    KP Pietersen          43     7     1    66
## 10     MV Boucher          36     4     1    41
## ..            ...         ...   ...   ...   ...
#Tendulkar & Rohit Sharma lead for Mumbai Indians
teamBattingScorecardOppnAllMatches(mi_pw_matches,"Mumbai Indians","Pune Warriors")
## Total= 756
## Source: local data frame [20 x 5]
## 
##            batsman ballsPlayed fours sixes  runs
##             (fctr)       (int) (int) (int) (dbl)
## 1     SR Tendulkar         134    21     1   152
## 2        RG Sharma         121     7     6   135
## 3       KD Karthik         107    10     3   108
## 4        AT Rayudu          93     8     1    92
## 5     JEC Franklin          70     5     2    63
## 6       KA Pollard          43     3     3    55
## 7         TL Suman          16     3     3    36
## 8  Harbhajan Singh          22     3     1    29
## 9       SL Malinga          16     2     1    19
## 10       A Symonds          18     2    NA    19
## 11      RT Ponting          17     2    NA    14
## 12      GJ Maxwell           7     1     1    13
## 13     RJ Peterson          13     1    NA    13
## 14     AC Blizzard           6     1    NA     6
## 15         PP Ojha           2    NA    NA     1
## 16        MM Patel           2    NA    NA     1
## 17         RE Levi           2    NA    NA     0
## 18        SA Yadav           4    NA    NA     0
## 19     NLTC Perera           4    NA    NA     0
## 20        DR Smith           1    NA    NA     0
teamBattingScorecardOppnAllMatches(mi_pw_matches,"Pune Warriors","Mumbai Indians")
## Total= 714
## Source: local data frame [28 x 5]
## 
##         batsman ballsPlayed fours sixes  runs
##          (fctr)       (int) (int) (int) (dbl)
## 1    RV Uthappa         131    13     4   151
## 2     MK Pandey          80     5     4    88
## 3  Yuvraj Singh          62     3     6    77
## 4      M Manhas          36     5    NA    42
## 5     SPD Smith          38     4    NA    41
## 6      MR Marsh          26     2     2    38
## 7      M Kartik          21     2     1    25
## 8      R Sharma          22     2     1    23
## 9      TL Suman          15     5    NA    23
## 10   WD Parnell          24     3    NA    22
## ..          ...         ...   ...   ...   ...
teamBattingScorecardOppnAllMatches(csk_dd_matches,"Delhi Daredevils","Chennai Super Kings")
## Total= 1983
## Source: local data frame [53 x 5]
## 
##             batsman ballsPlayed fours sixes  runs
##              (fctr)       (int) (int) (int) (dbl)
## 1          V Sehwag         147    27     9   233
## 2         G Gambhir         155    23     2   200
## 3         DA Warner         130    11     2   134
## 4    AB de Villiers          80     7     6   133
## 5        KD Karthik          99    15     1   129
## 6  DPMD Jayawardene          77     7     2    89
## 7         JA Morkel          63     8     2    81
## 8        TM Dilshan          65     8     3    79
## 9          S Dhawan          58     8     2    78
## 10          SS Iyer          56    11     1    77
## ..              ...         ...   ...   ...   ...
teamBattingScorecardOppnAllMatches(rr_sh_matches,"Rajasthan Royals","Sunrisers Hyderabad")
## Total= 808
## Source: local data frame [17 x 5]
## 
##          batsman ballsPlayed fours sixes  runs
##           (fctr)       (int) (int) (int) (dbl)
## 1      SR Watson          97    22     4   148
## 2      AM Rahane         145    17     1   148
## 3      SPD Smith          81    11     2   103
## 4      STR Binny          83     6     1    90
## 5      SV Samson          83     3     4    76
## 6    JP Faulkner          41     7     2    59
## 7       BJ Hodge          37     2     5    55
## 8       R Dravid          44     7     1    48
## 9      CH Morris          11     2     3    34
## 10       KK Nair          23     3    NA    17
## 11      R Bhatia          10     1    NA     8
## 12   DS Kulkarni           6     1    NA     7
## 13      DJ Hooda           9    NA    NA     7
## 14      AM Nayar           3     1    NA     4
## 15      PV Tambe           7    NA    NA     3
## 16 KW Richardson           2    NA    NA     1
## 17     DH Yagnik           4    NA    NA     0

9. Team performances of IPL bowlers (all matches with opposing IPL team)

Like the function above the following tables provide the top IPL bowlers of the respective teams in the matches against the opposition.

#Piyush Chawla has the most wickets for KXIP against RCB
teamBowlingPerfOppnAllMatches(kxip_rcb_matches,"Kings XI Punjab","Royal Challengers Bangalore")
## Source: local data frame [38 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (dbl)
## 1       PP Chawla    14       0   311      12
## 2       IK Pathan    12       0   159       9
## 3      YA Abdulla     9       1   103       8
## 4       RJ Harris     5       0    87       7
## 5         P Awana    11       0   149       6
## 6     S Sreesanth     6       0   101       5
## 7   Azhar Mahmood     8       0    74       5
## 8  Sandeep Sharma     8       1   101       4
## 9        AR Patel     5       0    94       4
## 10      VRV Singh     6       0    70       4
## ..            ...   ...     ...   ...     ...
#Ashwin is the highest wicket takes for CSK against DD
teamBowlingPerfOppnAllMatches(csk_dd_matches,main="Chennai Super Kings",opposition="Delhi Daredevils")
## Source: local data frame [26 x 5]
## 
##           bowler overs maidens  runs wickets
##           (fctr) (int)   (int) (dbl)   (dbl)
## 1       R Ashwin     9       0   233      17
## 2      JA Morkel    11       0   338      10
## 3       DJ Bravo     5       0   135       8
## 4      SB Jakati     4       0   140       6
## 5       L Balaji    10       0   117       6
## 6      MM Sharma     1       0    99       6
## 7      RA Jadeja     2       0    85       4
## 8      IC Pandey     1       0    80       4
## 9  BW Hilfenhaus     5       0    53       4
## 10       A Nehra     1       0    25       4
## ..           ...   ...     ...   ...     ...
teamBowlingPerfOppnAllMatches(dc_kkr_matches,"Deccan Chargers","Kolkata Knight Riders")
## Source: local data frame [26 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (dbl)
## 1        RP Singh    11       0   161       7
## 2         PP Ojha    11       0   196       6
## 3      WPUJC Vaas     4       0    67       5
## 4       A Symonds    12       0   100       4
## 5        DW Steyn     8       0    88       4
## 6        A Mishra     8       0    68       3
## 7  Jaskaran Singh     6       0    53       3
## 8       SB Styris     7       0    79       2
## 9       RJ Harris     4       0    20       2
## 10  Harmeet Singh    10       0    84       1
## ..            ...   ...     ...   ...     ...

10. Team bowler’s wickets in IPL Twenty20 (all matches with opposing IPL team)

This provided a graphical plot of the tables above

# Dirk Nannes and Umesh Yadav top for DD against CSK
teamBowlersWicketsOppnAllMatches(csk_dd_matches,"Delhi Daredevils","Chennai Superkings")

bowlerWicketsOppn-1

# SL Malinga and Munaf Patel lead in MI vs PW clashes
teamBowlersWicketsOppnAllMatches(mi_pw_matches,"Mumbai Indians","Pune Warrors")

bowlerWicketsOppn-2

teamBowlersWicketsOppnAllMatches(dc_kkr_matches,"Kolkata Knight Riders","Deccan Chargers",top=10) 

bowlerWicketsOppn-3

m <-teamBowlersWicketsOppnAllMatches(kxip_rcb_matches,"Royal Challengers Bangalore","Kings XI Punjab",plot=FALSE)
m
## Source: local data frame [20 x 2]
## 
##              bowler wickets
##              (fctr)   (int)
## 1         S Aravind       8
## 2            Z Khan       7
## 3          MA Starc       7
## 4          HV Patel       6
## 5           P Kumar       5
## 6         YS Chahal       5
## 7         JH Kallis       4
## 8     R Vinay Kumar       3
## 9          A Kumble       3
## 10         CH Gayle       3
## 11      AB McDonald       3
## 12         VR Aaron       3
## 13         DW Steyn       2
## 14    CK Langeveldt       2
## 15       DL Vettori       2
## 16         M Kartik       2
## 17 RE van der Merwe       2
## 18        R Rampaul       1
## 19        JA Morkel       1
## 20         AB Dinda       1

11. Team bowler vs batsmen in Twenty20(all matches with opposing IPL team)

These plots show how the IPL bowlers fared against the batsmen. It shows which of the opposing IPL teams batsmen were able to score the most runs

teamBowlersVsBatsmenOppnAllMatches(rr_sh_matches,'Rajasthan Royals',"Sunrisers Hyderabd",top=5)

bowlerVsBatsmen-1

teamBowlersVsBatsmenOppnAllMatches(kxip_rcb_matches,"Kings XI Punjab","Royal Challengers Bangalore",top=3)

bowlerVsBatsmen-2

teamBowlersVsBatsmenOppnAllMatches(dc_kkr_matches,"Deccan Chargers","Kolkata Knight Riders")

bowlerVsBatsmen-3

12. Team bowler’s wicket kind in Twenty20(caught,bowled,etc) (all matches with opposing IPL team)

The charts below show the wicket kind taken by the bowler of the IPL team(caught, bowled, lbw etc)

teamBowlersWicketKindOppnAllMatches(csk_dd_matches,"Delhi Daredevils","Chennai Super Kings",plot=TRUE)

bowlerWickets-1

m <- teamBowlersWicketKindOppnAllMatches(mi_pw_matches,"Pune Warriors","Mumbai Indians",plot=FALSE)
m[1:30,]
##          bowler wicketKind wicketPlayerOut runs
## 1       SB Wagh     caught    JEC Franklin   31
## 2      R Sharma     caught    SR Tendulkar   64
## 3     AC Thomas     caught       AT Rayudu   69
## 4      M Kartik    stumped         RE Levi   70
## 5      AB Dinda     caught       AT Rayudu  150
## 6      AB Dinda     caught       RG Sharma  150
## 7      M Kartik    stumped      KD Karthik   70
## 8    MN Samuels     bowled        SA Yadav   21
## 9      R Sharma     bowled      KA Pollard   64
## 10     AB Dinda     caught    JEC Franklin  150
## 11   WD Parnell     caught      SL Malinga   64
## 12     AB Dinda        lbw Harbhajan Singh  150
## 13 Yuvraj Singh     caught      RT Ponting   61
## 14     AJ Finch     caught    SR Tendulkar   11
## 15     MR Marsh        lbw      KD Karthik   24
## 16    AC Thomas     caught     AC Blizzard   69
## 17 Yuvraj Singh     caught    SR Tendulkar   61
## 18 Yuvraj Singh     caught       AT Rayudu   61
## 19     R Sharma     caught       RG Sharma   64
## 20     R Sharma     caught        TL Suman   64
## 21    JE Taylor     caught       A Symonds   34
## 22    JE Taylor     caught      KA Pollard   34
## 23      B Kumar     caught    JEC Franklin   50
## 24    MJ Clarke    run out       RG Sharma    9
## 25      A Nehra     caught    SR Tendulkar   19
## 26      A Nehra     caught     RJ Peterson   19
## 27      B Kumar     bowled       AT Rayudu   50
## 28      A Nehra    run out     NLTC Perera   19
## 29     AB Dinda     caught Harbhajan Singh  150
## 30   WD Parnell    run out      SL Malinga   64
teamBowlersWicketKindOppnAllMatches(dc_kkr_matches,"Kolkata Knight Riders",'Deccan Chargers',plot=TRUE)

bowlerWickets-2

13. Team bowler’s wicket taken and runs conceded in Twenty20(all matches with opposing IPL team)

teamBowlersWicketRunsOppnAllMatches(csk_ktk_matches,"Kochi Tuskers Kerala","Chennai Super Kings")

wicketRuns-1

m <-teamBowlersWicketRunsOppnAllMatches(mi_pw_matches,"Mumbai Indians","Pune Warriors",plot=FALSE)
m[1:30,]
## Source: local data frame [30 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1       AG Murtaza     4       0    18       2
## 2       SL Malinga     9       1   143      10
## 3         AN Ahmed     5       0    40       4
## 4         MM Patel     6       1    88       7
## 5       KA Pollard     6       0    99       5
## 6     JEC Franklin     4       0    64       1
## 7  Harbhajan Singh     7       0    85       6
## 8          PP Ojha     8       0    95       4
## 9       MG Johnson     5       0    41       4
## 10        R Dhawan     1       0    27       0
## ..             ...   ...     ...   ...     ...

14. Plot of wins vs losses between teams in IPL T20 confrontations

setwd("C:/software/cricket-package/york-test/yorkrData/IPL/IPL-T20-matches")
plotWinLossBetweenTeams("Chennai Super Kings","Delhi Daredevils")

winsLosses-1

plotWinLossBetweenTeams("Deccan Chargers","Kolkata Knight Riders",".")

winsLosses-2

plotWinLossBetweenTeams('Kings XI Punjab',"Royal Challengers Bangalore",".")

winsLosses-3

plotWinLossBetweenTeams("Mumbai Indians","Pune Warriors",".")

winsLosses-4

plotWinLossBetweenTeams('Rajasthan Royals',"Sunrisers Hyderabad",".")

winsLosses-5

plotWinLossBetweenTeams('Chennai Super Kings',"Mumbai Indians",".")

winsLosses-6

Conclusion

This post included all functions for all IPL Twenty20 matches between any 2 IPL teams. As before the data frames are already available. You can load the data and begin to use them. If more insights from the dataframe are possible do go ahead. But please do attribute the source to Cricheet (http://cricsheet.org), my package yorkr and my blog. Do give the functions a spin for yourself!

You may also like

  1. yorkr pads up for the Twenty20s: Part 1- Analyzing team“s match performance
  2. yorkr pads up for the Twenty20s:Part 4- Individual batting and bowling performances
  3. Introducing cricket package yorkr: Part 2-Trapped leg before wicket!
  4. Introducing cricket package yorkr:Part 4-In the block hole!
  5. Introducing cricketr! : An R package to analyze performances of cricketers
  6. Cricket analytics with cricketr
  7. OpenCV: Fun with filters and convolution
  8. To Hadoop, or not to Hadoop
  9. Close encounters with the future
  10. Presentation on ‘Evolution to LTE’

yorkr crashes the IPL party ! – Part 1


Where tireless striving stretches its arms towards perfection

Where the clear stream of reason has not lost its way

Into the dreary desert sand of dead habit

                Rabindranath Tagore

Introduction

In this post, my R package yorkr crashes the IPL party! In my earlier posts I had already created functions for handling Twenty20 matches. I now use these functions to analyze the IPL T20 matches. This package is based on data from Cricsheet. The T20 functionality were added in the following posts

  1. yorkr pads up for the Twenty20s: Part 1- Analyzing team“s match performance.
  2. yorkr pads up for the Twenty20s: Part 2-Head to head confrontation between teams
  3. yorkr pads up for the Twenty20s:Part 3:Overall team performance against all oppositions!
  4. yorkr pads up for the Twenty20s:Part 4- Individual batting and bowling performances

The yorkr package provides functions to convert the yaml files to more easily R consumable entities, namely dataframes. All converted files for ODI,T20 and IPL are available for use at yorkrData.

The IPL T20 matches can be downloaded from IPL-T20-matches

This post can be viewed at RPubs at yorkrIPLT20-Part1 or can also be downloaded as a PDF document yorkrIPLT20-1.pdf

Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today!

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

2. Install the package from CRAN

library(yorkr)
rm(list=ls())

2a. New functionality for Twenty20

The functions that were used to convert the Twenty20 yaml files to RData are

  1. convertYaml2RDataframeT20
  2. convertAllYaml2RDataframesT20

Note 1: While I have already converted the IPL T20 files, you will need to use these functions for future IPL matches

Note 2: This post includes some cosmetic changes made over yorkr_0.0.4, where I make the plot title more explicit. The functionality will be available in a few weeks from now in yorkr_0.0.5

3. Convert and save T20 yaml file to dataframe

This function will convert a T20 IPL yaml file, in the format as specified in Cricsheet to dataframe. This will be saved as as RData file in the target directory. The name of the file wil have the following format team1-team2-date.RData. An example of how a yaml file can be converted to a dataframe and saved is shown below.

convertYaml2RDataframeT20("335982.yaml",".",".") 
## [1] "./335982.yaml"
## [1] "first loop"
## [1] "second loop"

4. Convert and save all T20 yaml files to dataframes

This function will convert all IPL T20 yaml files from a source directory to dataframes, and save it in the target directory, with the names as mentioned above. Since I have already done this, I will not be executing this again. You can download the zip of all the converted RData files from Github at IPL-T20-matches

#convertAllYaml2RDataframesT20("./IPL","./data")

5. yorkrData – A Github repositiory

Cricsheet had a total of 518 IPL Twenty20 matches. Out of which 9 files seemed to have problem. The remaining 509 T20 matches have been converted to RData.

All the converted RData files can be accessed from my Github link yorkrData under the folder IPL-T20-matches

You can download the the zip of the files and use it directly in the functions as follows

6. Load the match data as dataframes

For this post I will be using the IPL Twenty20 match data from 5 random matches between 10 different opposing IPL teams. For this I will directly use the converted RData files rather than getting the data through the getMatchDetails() as shown below

With the RData we can load the data in 2 ways

A. With getMatchDetails()

  1. With getMatchDetails() using the 2 teams and the date on which the match occured
sh_mi <- getMatchDetails("Sunrisers Hyderabad","Royal Challengers Bangalore","2014-05-20",dir=".")
dim(sh_mi)
## [1] 244  25

or

B.Directly load RData into your code.

The match details will be loaded into a dataframe called ’overs’ which you can assign to a suitable name as below

The randomly selected IPL T20 matches are

  • Sunrisers Hyderabad vs Royal Challengers Bangalore, 2014-05-20
  • Rajasthan Royals vs Pune Warriors, 2013-05-05
  • Deccan Chargers vs Chennai Super Kings, 2008-05-27
  • Kings Xi Punjab vs Delhi Daredevils, 2014-05-25
  • Kolkata Knight Riders vs Mumbai Indian, 2014-05-14
setwd("C:/software/cricket-package/cricsheet/cleanup/IPL/part1")
load("Sunrisers Hyderabad-Royal Challengers Bangalore-2014-05-20.RData")
sh_rcb <- overs
load("Rajasthan Royals-Pune Warriors-2013-05-05.RData")
rr_pw <- overs
load("Deccan Chargers-Chennai Super Kings-2008-05-27.RData")
dc_csk <- overs
load("Kings XI Punjab-Delhi Daredevils-2014-05-25.RData")
kxp_dd <-overs
load("Kolkata Knight Riders-Mumbai Indians-2014-05-14.RData")
kkr_mi <- overs

7. Team batting scorecard

Compute and display the batting scorecard of the teams in the match.

teamBattingScorecardMatch(kkr_mi,'Mumbai Indians')
## Total= 134
## Source: local data frame [7 x 5]
## 
##       batsman ballsPlayed fours sixes  runs
##        (fctr)       (int) (dbl) (dbl) (dbl)
## 1 LMP Simmons          13     2     0    12
## 2   CM Gautam           9     1     0     8
## 3   AT Rayudu          26     3     1    33
## 4   RG Sharma          45     4     2    51
## 5 CJ Anderson          12     1     1    18
## 6  KA Pollard          11     0     0    10
## 7     AP Tare           3     0     0     2
teamBattingScorecardMatch(kkr_mi,'Kolkata Knight Riders')
## Total= 137
## Source: local data frame [5 x 5]
## 
##           batsman ballsPlayed fours sixes  runs
##            (fctr)       (int) (dbl) (dbl) (dbl)
## 1      RV Uthappa          52     9     3    80
## 2       G Gambhir          17     1     0    14
## 3       MK Pandey          21     0     0    14
## 4       YK Pathan          13     3     0    20
## 5 Shakib Al Hasan           8     1     0     9
teamBattingScorecardMatch(sh_rcb,'Sunrisers Hyderabad')
## Total= 154
## Source: local data frame [5 x 5]
## 
##     batsman ballsPlayed fours sixes  runs
##      (fctr)       (int) (dbl) (dbl) (dbl)
## 1  S Dhawan          39     7     1    50
## 2 DA Warner          43     3     4    59
## 3   NV Ojha          19     0     2    24
## 4  AJ Finch           9     1     0    11
## 5 DJG Sammy           4     0     1    10
teamBattingScorecardMatch(rr_pw,'Pune Warriors')
## Total= 167
## Source: local data frame [5 x 5]
## 
##        batsman ballsPlayed fours sixes  runs
##         (fctr)       (int) (int) (dbl) (dbl)
## 1   RV Uthappa          41     8     1    54
## 2     AJ Finch          32     7     0    45
## 3 Yuvraj Singh          11     1     1    15
## 4     MR Marsh          21     2     3    35
## 5   AD Mathews          15     2     0    18
teamBattingScorecardMatch(dc_csk,'Chennai Super Kings')
## Total= 137
## Source: local data frame [5 x 5]
## 
##      batsman ballsPlayed fours sixes  runs
##       (fctr)       (int) (int) (dbl) (dbl)
## 1   PA Patel          27     3     0    20
## 2 SP Fleming           9     3     0    14
## 3   SK Raina          41     5     2    54
## 4   MS Dhoni          24     4     1    37
## 5  JA Morkel          12     1     0    12
teamBattingScorecardMatch(kxp_dd,'Kings XI Punjab')
## Total= 104
## Source: local data frame [5 x 5]
## 
##      batsman ballsPlayed fours sixes  runs
##       (fctr)       (int) (dbl) (dbl) (dbl)
## 1   V Sehwag           7     2     0     9
## 2    M Vohra          37     4     2    47
## 3 GJ Maxwell           2     0     0     0
## 4  DA Miller          34     4     2    47
## 5  GJ Bailey           1     0     0     1
teamBattingScorecardMatch(kkr_mi,'Mumbai Indians')
## Total= 134
## Source: local data frame [7 x 5]
## 
##       batsman ballsPlayed fours sixes  runs
##        (fctr)       (int) (dbl) (dbl) (dbl)
## 1 LMP Simmons          13     2     0    12
## 2   CM Gautam           9     1     0     8
## 3   AT Rayudu          26     3     1    33
## 4   RG Sharma          45     4     2    51
## 5 CJ Anderson          12     1     1    18
## 6  KA Pollard          11     0     0    10
## 7     AP Tare           3     0     0     2

8. Plot the team batting partnerships

The functions below plot the team batting partnership in the match Note: Many of the plots include an additional parameters plot which is either TRUE or FALSE. The default value is plot=TRUE. When plot=TRUE the plot will be displayed. When plot=FALSE the data frame will be returned to the user. The user can use this to create an interactive chary using one of the packages like rcharts, ggvis,googleVis or plotly.

teamBatsmenPartnershipMatch(kkr_mi,'Mumbai Indians','Kolkata Knight Riders')

batsmenPartnership-1

teamBatsmenPartnershipMatch(sh_rcb,'Sunrisers Hyderabad','Royal Challengers Bangalore',plot=TRUE)

batsmenPartnership-2

teamBatsmenPartnershipMatch(rr_pw,'Pune Warriors','Rajasthan Royals')

batsmenPartnership-3

teamBatsmenPartnershipMatch(dc_csk,'Chennai Super Kings','Deccan Chargers',plot=FALSE)
##      batsman nonStriker runs
## 1   PA Patel SP Fleming   10
## 2   PA Patel   SK Raina   10
## 3 SP Fleming   PA Patel   14
## 4   SK Raina   PA Patel   19
## 5   SK Raina   MS Dhoni   14
## 6   SK Raina  JA Morkel   21
## 7   MS Dhoni   SK Raina   37
## 8  JA Morkel   SK Raina   12
teamBatsmenPartnershipMatch(kxp_dd,'Kings XI Punjab','Delhi Daredevils',plot=TRUE)

batsmenPartnership-4

9. Batsmen vs Bowler

The function below computes and plots the performances of the batsmen vs the bowlers. As before the plot parameter can be set to TRUE or FALSE. By default it is plot=TRUE

teamBatsmenVsBowlersMatch(sh_rcb,"Sunrisers Hyderabad","Royal Challengers Bangalore", plot=TRUE)

batsmenVsBowler-1

teamBatsmenVsBowlersMatch(kkr_mi,'Kolkata Knight Riders','Mumbai Indians')

batsmenVsBowler-2

m <- teamBatsmenVsBowlersMatch(rr_pw,'Pune Warriors','Rajasthan Royals',plot=FALSE)
m
## Source: local data frame [20 x 3]
## Groups: batsman [?]
## 
##         batsman      bowler runsConceded
##          (fctr)      (fctr)        (dbl)
## 1    RV Uthappa  A Chandila           12
## 2    RV Uthappa JP Faulkner            1
## 3    RV Uthappa   SR Watson           13
## 4    RV Uthappa   KK Cooper            2
## 5    RV Uthappa  SK Trivedi           18
## 6    RV Uthappa   STR Binny            8
## 7      AJ Finch  A Chandila           11
## 8      AJ Finch JP Faulkner           12
## 9      AJ Finch   SR Watson            5
## 10     AJ Finch   KK Cooper            8
## 11     AJ Finch  SK Trivedi            9
## 12 Yuvraj Singh   KK Cooper            0
## 13 Yuvraj Singh  SK Trivedi            5
## 14 Yuvraj Singh   STR Binny           10
## 15     MR Marsh JP Faulkner           13
## 16     MR Marsh   SR Watson            7
## 17     MR Marsh   KK Cooper           15
## 18   AD Mathews JP Faulkner            7
## 19   AD Mathews   SR Watson            3
## 20   AD Mathews   KK Cooper            8
teamBatsmenVsBowlersMatch(dc_csk,"Chennai Super Kings","Deccan Chargers")

batsmenVsBowler-3

teamBatsmenVsBowlersMatch(kxp_dd,"Kings XI Punjab","Delhi Daredevils")

batsmenVsBowler-4

10. Bowling Scorecard

This function provides the bowling performance, the number of overs bowled, maidens, runs conceded and wickets taken for each match

teamBowlingScorecardMatch(kkr_mi,'Kolkata Knight Riders')
## Source: local data frame [6 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (dbl)
## 1        M Morkel     4       0    35       2
## 2        UT Yadav     3       0    24       0
## 3 Shakib Al Hasan     4       0    21       1
## 4       SP Narine     4       0    18       1
## 5       PP Chawla     4       0    32       1
## 6       YK Pathan     1       0    10       0
teamBowlingScorecardMatch(kkr_mi,'Mumbai Indians')
## Source: local data frame [6 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (dbl)
## 1      SL Malinga     4       0    30       1
## 2       JJ Bumrah     3       0    23       0
## 3 Harbhajan Singh     4       0    22       2
## 4         PP Ojha     4       0    25       0
## 5     LMP Simmons     3       0    34       1
## 6      KA Pollard     1       0     7       0
teamBowlingScorecardMatch(sh_rcb,"Sunrisers Hyderabad")
## Source: local data frame [7 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (dbl)
## 1         B Kumar     4       0    27       2
## 2        DW Steyn     4       0    23       1
## 3   Parvez Rasool     4       0    26       1
## 4       KV Sharma     3       0    27       1
## 5 Y Venugopal Rao     1       0     7       0
## 6       IK Pathan     3       0    28       1
## 7       DJG Sammy     1       0    19       0
teamBowlingScorecardMatch(rr_pw,'Pune Warriors')
## Source: local data frame [6 x 5]
## 
##         bowler overs maidens  runs wickets
##         (fctr) (int)   (int) (dbl)   (dbl)
## 1      B Kumar     4       0    38       1
## 2   K Upadhyay     3       0    29       0
## 3   WD Parnell     4       0    27       3
## 4     R Sharma     4       0    38       0
## 5 Yuvraj Singh     2       0    16       0
## 6   AD Mathews     3       0    34       1
teamBowlingScorecardMatch(dc_csk,"Chennai Super Kings")
## Source: local data frame [5 x 5]
## 
##           bowler overs maidens  runs wickets
##           (fctr) (int)   (int) (dbl)   (int)
## 1        M Ntini     4       0    24       1
## 2        MS Gony     4       0    21       1
## 3      JA Morkel     4       0    37       3
## 4 M Muralitharan     4       0    22       1
## 5       L Balaji     4       0    34       2
teamBowlingScorecardMatch(kxp_dd,"Kings XI Punjab")
## Source: local data frame [5 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (int)
## 1         P Awana     3       1    15       2
## 2        AR Patel     4       0    28       2
## 3      MG Johnson     4       1    27       2
## 4 Karanveer Singh     4       0    22       2
## 5        R Dhawan     4       0    22       2

11. Wicket Kind

The plots below provide the bowling kind of wicket taken by the bowler (caught, bowled, lbw etc.)

teamBowlingWicketKindMatch(kkr_mi,'Kolkata Knight Riders','Mumbai Indians')

bowlingWicketKind-1

m <- teamBowlingWicketKindMatch(rr_pw,'Pune Warriors','Rajasthan Royals',plot=FALSE)
m
##         bowler wicketKind wicketPlayerOut runs
## 1   AD Mathews     caught        R Dravid   34
## 2   WD Parnell     bowled       SR Watson   27
## 3      B Kumar     caught       AM Rahane   38
## 4   WD Parnell     caught        BJ Hodge   27
## 5   WD Parnell     caught       SV Samson   27
## 6   K Upadhyay   noWicket        noWicket   29
## 7     R Sharma   noWicket        noWicket   38
## 8 Yuvraj Singh   noWicket        noWicket   16
teamBowlingWicketKindMatch(dc_csk,"Chennai Super Kings","Deccan Chargers")

bowlingWicketKind-2

teamBowlingWicketKindMatch(kxp_dd,"Kings XI Punjab","Delhi Daredevils",plot=TRUE)

bowlingWicketKind-3

teamBowlingWicketKindMatch(sh_rcb,"Royal Challengers Bangalore","Sunrisers Hyderabad")

bowlingWicketKind-4

12. Wicket vs Runs conceded

The plots below provide the wickets taken and the runs conceded by the bowler in the match

teamBowlingWicketRunsMatch(dc_csk,"Deccan Chargers", "Chennai Super Kings")

wicketRuns-1

teamBowlingWicketRunsMatch(kxp_dd,"Kings XI Punjab","Delhi Daredevils",plot=TRUE)

wicketRuns-2

teamBowlingWicketRunsMatch(sh_rcb,"Sunrisers Hyderabad","Royal Challengers Bangalore")

wicketRuns-3

teamBowlingWicketRunsMatch(kkr_mi,'Kolkata Knight Riders','Mumbai Indians')

wicketRuns-4

m <- teamBowlingWicketKindMatch(rr_pw,'Pune Warriors','Rajasthan Royals',plot=FALSE)
m
##         bowler wicketKind wicketPlayerOut runs
## 1   AD Mathews     caught        R Dravid   34
## 2   WD Parnell     bowled       SR Watson   27
## 3      B Kumar     caught       AM Rahane   38
## 4   WD Parnell     caught        BJ Hodge   27
## 5   WD Parnell     caught       SV Samson   27
## 6   K Upadhyay   noWicket        noWicket   29
## 7     R Sharma   noWicket        noWicket   38
## 8 Yuvraj Singh   noWicket        noWicket   16

13. Wickets taken by bowler

The plots provide the wickets taken by the bowler

teamBowlingWicketMatch(kkr_mi,'Kolkata Knight Riders','Mumbai Indians')

bowlingWickets-1

m <- teamBowlingWicketMatch(rr_pw,'Pune Warriors','Rajasthan Royals',plot=FALSE)
m
##         bowler wicketKind wicketPlayerOut runs
## 1   AD Mathews     caught        R Dravid   34
## 2   WD Parnell     bowled       SR Watson   27
## 3      B Kumar     caught       AM Rahane   38
## 4   WD Parnell     caught        BJ Hodge   27
## 5   WD Parnell     caught       SV Samson   27
## 6   K Upadhyay   noWicket        noWicket   29
## 7     R Sharma   noWicket        noWicket   38
## 8 Yuvraj Singh   noWicket        noWicket   16
teamBowlingWicketMatch(sh_rcb,"Royal Challengers Bangalore","Sunrisers Hyderabad")

bowlingWickets-2

teamBowlingWicketMatch(dc_csk,"Deccan Chargers", "Chennai Super Kings")

bowlingWickets-3

teamBowlingWicketMatch(kxp_dd,"Kings XI Punjab","Delhi Daredevils",plot=TRUE)

bowlingWickets-4

14. Bowler Vs Batsmen

The functions compute and display how the different bowlers of the country performed against the batting opposition.

teamBowlersVsBatsmenMatch(dc_csk,"Deccan Chargers", "Chennai Super Kings")

bowlerVsBatsmen-1

teamBowlersVsBatsmenMatch(kxp_dd,"Kings XI Punjab","Delhi Daredevils",plot=TRUE)

bowlerVsBatsmen-2

m <-teamBowlersVsBatsmenMatch(sh_rcb,"Sunrisers Hyderabad","Royal Challengers Bangalore",plot=FALSE)
m
## Source: local data frame [26 x 3]
## Groups: bowler [?]
## 
##      bowler        batsman runsConceded
##      (fctr)         (fctr)        (dbl)
## 1   B Kumar       CH Gayle            5
## 2   B Kumar       PA Patel            4
## 3   B Kumar        V Kohli            6
## 4   B Kumar AB de Villiers            6
## 5   B Kumar         S Rana            1
## 6   B Kumar       MA Starc            5
## 7  DW Steyn       CH Gayle            7
## 8  DW Steyn        V Kohli            4
## 9  DW Steyn AB de Villiers            4
## 10 DW Steyn         S Rana            7
## ..      ...            ...          ...
teamBowlersVsBatsmenMatch(rr_pw,'Pune Warriors','Rajasthan Royals')

bowlerVsBatsmen-3

teamBowlersVsBatsmenMatch(kkr_mi,'Kolkata Knight Riders','Mumbai Indians')

bowlerVsBatsmen-4

15. Match worm graph

The plots below provide the match worm graph for the IPL Twenty 20 matches

matchWormGraph(dc_csk,"Deccan Chargers", "Chennai Super Kings")

matchWorm-1

matchWormGraph(kxp_dd,"Kings XI Punjab","Delhi Daredevils")

matchWorm-2

matchWormGraph(sh_rcb,"Sunrisers Hyderabad","Royal Challengers Bangalore")

matchWorm-3

matchWormGraph(rr_pw,'Pune Warriors','Rajasthan Royals')

matchWorm-4

matchWormGraph(kkr_mi,'Kolkata Knight Riders','Mumbai Indians')

matchWorm-5

Conclusion

This post included all functions between 2 IPL teams from the package yorkr for IPL Twenty20 matches.As mentioned above the yaml match files have been already converted to dataframes and are available for download from Github. Go ahead and give it a try

To be continued. Watch this space!

You may also like

  1. Introducing cricket package yorkr-Part1:Beaten by sheer pace!.
  2. Introducing cricketr! : An R package to analyze performances of cricketers
  3. Simulating a Web Joint in Android
  4. Elements of CRUD with NodeExpress and MongoDB using Enide Studio
  5. Cricket analytics with cricketr
  6. Sixer – R package cricketr’s new Shiny avatar
  7. Natural language processing: What would Shakespeare say?
  8. Experiment with deblurring using OpenCV
  9. Presentation on Wireless Technologies – Part 2

yorkr pads up for Twenty20s:Part 4- Individual batting and bowling performances!


Introduction

In theory, theory and practice are the same. In practice, they’re not.

                      Yogi Berra

There are two ways to write error-free programs; only the third one works.

                      Alan Perlis

Simplicity does not precede complexity, but follows it.

                      Alan Perlis

Talk is cheap. Show me the code.

                      Linus Torvalds

This post is the 4th and the last part of yorkr padding for the Twenty20s. In this post I look at the top individual batting and bowling performances in the Twenty20s. Also please take a look at my 3 earlier post on yorkr’s handling of Twenty20 matches

  1. yorkr pads up for the Twenty20s: Part 1- Analyzing team“s match performance.
  2. yorkr pads up for the Twenty20s: Part 2-Head to head confrontation between teams
  3. yorkr pads up for the Twenty20s:Part 3:Overall team performance against all oppositions!

The 1st part included functions dealing with a specific T20 match, the 2nd part dealt with functions between 2 opposing teams in T20 confrontations. The 3rd part dealt with functions between a team and all T20 matches with all oppositions. This 4th part includes individual batting and bowling performances in T20 matches and deals with Class 4 functions.

This post has also been published at RPubs yorkrT20-Part4 and can also be downloaded as a PDF document from yorkrT20-Part4.pdf.

You can clone/fork the code for the package yorkr from Github at yorkr-package

The list of Class 4 functions are shown below.The Twenty20 features will be available from yorkr_0.0.4

Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today!

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

Note: To do similar analysis you can use my yorkrT20templates. See my post Analysis of International T20 matches with yorkr templates

Batsman functions

  1. batsmanRunsVsDeliveries
  2. batsmanFoursSixes
  3. batsmanDismissals
  4. batsmanRunsVsStrikeRate
  5. batsmanMovingAverage
  6. batsmanCumulativeAverageRuns
  7. batsmanCumulativeStrikeRate
  8. batsmanRunsAgainstOpposition
  9. batsmanRunsVenue
  10. batsmanRunsPredict

Bowler functions

  1. bowlerMeanEconomyRate
  2. bowlerMeanRunsConceded
  3. bowlerMovingAverage
  4. bowlerCumulativeAvgWickets
  5. bowlerCumulativeAvgEconRate
  6. bowlerWicketPlot
  7. bowlerWicketsAgainstOpposition
  8. bowlerWicketsVenue
  9. bowlerWktsPredict

Note: The yorkr package in its current avatar only supports ODI & Twenty20 matches. I will be upgrading the package to handle IPL in the months to come.

library(yorkr)
library(gridExtra)
library(rpart.plot)
library(dplyr)
library(ggplot2)
rm(list=ls())

A. Batsman functions

1. Get Team Batting details

The function below gets the overall team batting details based on the RData file available in T20 matches. This is currently also available in Github at [yorkrData] (https://github.com/tvganesh/yorkrData/tree/master/Twenty20/T20-matches). The batting details of the team in each match is created and a huge data frame is created by rbinding the individual dataframes. This can be saved as a RData file

setwd("C:/software/cricket-package/york-test/yorkrData/Twenty20/T20-matches")
india_details <- getTeamBattingDetails("India",dir=".", save=TRUE)
sa_details <- getTeamBattingDetails("South Africa",dir=".",save=TRUE)
nz_details <- getTeamBattingDetails("New Zealand",dir=".",save=TRUE)
eng_details <- getTeamBattingDetails("England",dir=".",save=TRUE)
pak_details <- getTeamBattingDetails("Pakistan",dir=".",save=TRUE)
aus_details <- getTeamBattingDetails("Australia",dir=".",save=TRUE)
wi_details <- getTeamBattingDetails("West Indies",dir=".",save=TRUE)

2. Get batsman details

This function is used to get the individual T20 batting record for a the specified batsman of the country as in the functions below. For analyzing the batting performances the top T20 batsmen from different countries have been chosen. The batting scorecard functions from yorkr pads up for the Twenty20s:Part 3:Overall team performance against all oppositions! was used for selecting these batsmen

  1. Virat Kohli (Ind)
  2. DA Warner (Aus)
  3. Umar Akmal (Pak)
  4. BB McCullum (NZ)
  5. EJG Morgan (Eng)
  6. CH Gayle (WI)
setwd("C:/software/cricket-package/cricsheet/cleanup/T20/rmd/part4")
kohli <- getBatsmanDetails(team="India",name="Kohli",dir=".")
## [1] "./India-BattingDetails.RData"
warner <- getBatsmanDetails(team="Australia",name="DA Warner")
## [1] "./Australia-BattingDetails.RData"
akmal <-  getBatsmanDetails(team="Pakistan",name="Umar Akmal",dir=".")
## [1] "./Pakistan-BattingDetails.RData"
mccullum <-  getBatsmanDetails(team="New Zealand",name="BB McCullum",dir=".")
## [1] "./New Zealand-BattingDetails.RData"
emorgan <-  getBatsmanDetails(team="England",name="EJG Morgan",dir=".")
## [1] "./England-BattingDetails.RData"
gayle <-  getBatsmanDetails(team="West Indies",name="CH Gayle",dir=".")
## [1] "./West Indies-BattingDetails.RData"

3. Runs versus deliveries

Chris Gayle and B McCullum have an astounding strike rate and touch close to 120 runs in 60 balls. David Warner also has a great strike rate

p1 <-batsmanRunsVsDeliveries(kohli,"Kohli")
p2 <-batsmanRunsVsDeliveries(warner, "DA Warner")
p3 <-batsmanRunsVsDeliveries(akmal,"U Akmal")
p4 <-batsmanRunsVsDeliveries(mccullum,"BB McCullum")
p5 <-batsmanRunsVsDeliveries(emorgan,"EJG Morgan")
p6 <-batsmanRunsVsDeliveries(gayle,"CH Gayle")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

runsVsDeliveries-1

4. Batsman Total runs, Fours and Sixes

The plots below show the total runs, fours and sixes by the batsmen. Gayle tops in the runs from sixes

kohli46 <- select(kohli,batsman,ballsPlayed,fours,sixes,runs)
p1 <- batsmanFoursSixes(kohli46,"Kohli")
warner46 <- select(warner,batsman,ballsPlayed,fours,sixes,runs)
p2 <- batsmanFoursSixes(warner46,"DA Warner")
akmal46 <- select(akmal,batsman,ballsPlayed,fours,sixes,runs)
p3 <- batsmanFoursSixes(akmal46, "U Akmal")
mccullum46 <- select(mccullum,batsman,ballsPlayed,fours,sixes,runs)
p4 <- batsmanFoursSixes(mccullum46,"BB McCullum")
emorgan46 <- select(emorgan,batsman,ballsPlayed,fours,sixes,runs)
p5 <- batsmanFoursSixes(emorgan46,"EJG Morgan")
gayle46 <- select(gayle,batsman,ballsPlayed,fours,sixes,runs)
p6 <- batsmanFoursSixes(gayle46,"CH Gayle")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

foursSixes-1

5. Batsman dismissals

The type of dismissal for each batsman is shown below

p1 <-batsmanDismissals(kohli,"Kohli")
p2 <-batsmanDismissals(warner, "DA Warner")
p3 <-batsmanDismissals(akmal,"U Akmal")
p4 <-batsmanDismissals(mccullum,"BB McCullum")
p5 <-batsmanDismissals(emorgan,"EJG Morgan")
p6 <-batsmanDismissals(gayle,"CH Gayle")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

dismissal-1

6. Runs versus Strike Rate

Gayle’s and McCullum’s strike rate touch 120% for runs in the range of 130-150

p1 <-batsmanRunsVsStrikeRate(kohli,"Kohli")
p2 <-batsmanRunsVsStrikeRate(warner, "DA Warner")
p3 <-batsmanRunsVsStrikeRate(akmal,"U Akmal")
p4 <-batsmanRunsVsStrikeRate(mccullum,"BB McCullum")
p5 <-batsmanRunsVsStrikeRate(emorgan,"EJG Morgan")
p6 <-batsmanRunsVsStrikeRate(gayle,"CH Gayle")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

runsSR-1

7. Batsman moving average

Kohli and Gayle T20 average is on the increase touching 50. Eoin Morgan and BB McCullum average around 40.

p1 <-batsmanMovingAverage(kohli,"Kohli")
p2 <-batsmanMovingAverage(warner, "DA Warner")
p3 <-batsmanMovingAverage(akmal,"U Akmal")
p4 <-batsmanMovingAverage(mccullum,"BB McCullum")
p5 <-batsmanMovingAverage(emorgan,"EJG Morgan")
p6 <-batsmanMovingAverage(gayle,"CH Gayle")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

ma-1

8. Batsman cumulative average

Kohli’s cumulative average steadies around 40, McCullum shows a gentle decline from 40+ to 35+. Gayle oscillates between 30+ to 40-.

p1 <-batsmanCumulativeAverageRuns(kohli,"Kohli")
p2 <-batsmanCumulativeAverageRuns(warner, "DA Warner")
p3 <-batsmanCumulativeAverageRuns(akmal,"U Akmal")
p4 <-batsmanCumulativeAverageRuns(mccullum,"BB McCullum")
p5 <-batsmanCumulativeAverageRuns(emorgan,"EJG Morgan")
p6 <-batsmanCumulativeAverageRuns(gayle,"CH Gayle")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

cAvg-1

9. Cumulative Average Strike Rate

BB McCullum has the best overall cumulative strike rate which hovered around the 150 and steadies around 130. Gayle has a rocky cumulative strike between 150 -130s. Warner is steady around 120.

p1 <-batsmanCumulativeStrikeRate(kohli,"Kohli")
p2 <-batsmanCumulativeStrikeRate(warner, "DA Warner")
p3 <-batsmanCumulativeStrikeRate(akmal,"U Akmal")
p4 <-batsmanCumulativeStrikeRate(mccullum,"BB McCullum")
p5 <-batsmanCumulativeStrikeRate(emorgan,"EJG Morgan")
p6 <-batsmanCumulativeStrikeRate(gayle,"CH Gayle")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

cSR-1

10. Batsman runs against opposition

#Kohli's best performances are against New Zealand and Sri Lanka
batsmanRunsAgainstOpposition(kohli,"Kohli")

runsOppn1-1

batsmanRunsAgainstOpposition(warner, "DA Warner")

runsOppn2-1

batsmanRunsAgainstOpposition(akmal,"U Akmal")

runsOppn3-1

batsmanRunsAgainstOpposition(mccullum,"BB McCullum")

runsOppn4-1

batsmanRunsAgainstOpposition(emorgan,"EJG Morgan")

runsOppn5-1

# Gayle's best performance is against India and South Africa
batsmanRunsAgainstOpposition(gayle,"CH Gayle")

runsOppn6-1

11. Runs at different venues

The plots below give the performances of the batsmen at different grounds.

batsmanRunsVenue(kohli,"Kohli")

runsVenue1-1

batsmanRunsVenue(warner, "DA Warner")

runsVenue2-1

batsmanRunsVenue(akmal,"U Akmal")

runsVenue3-1

batsmanRunsVenue(mccullum,"BB McCullum")

runsVenue4-1

batsmanRunsVenue(emorgan,"EJG Morgan")

runsVenue5-1

batsmanRunsVenue(gayle,"CH Gayle")

runsVenue6-1

12. Predict number of runs to deliveries

The plots below use rpart classification tree to predict the number of deliveries required to score the runs in the leaf node. For e.g. Kohli takes

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsPredict(kohli,"Kohli")
batsmanRunsPredict(warner, "DA Warner")
batsmanRunsPredict(akmal,"U Akmal")

runsPredict1,runsVenue1-1

# BB McCullum needs >32 deliveries to score 69+ runs while Gayle needs >28 deliveries to score 67 runs
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsPredict(mccullum,"BB McCullum")
batsmanRunsPredict(emorgan,"EJG Morgan")
batsmanRunsPredict(gayle,"CH Gayle")

runsPredict2,runsVenue1-1

B. Bowler functions

13. Get bowling details

The function below gets the overall team T20 bowling details based on the RData file available in T20 matches. This is currently also available in Github at [yorkrData] (https://github.com/tvganesh/yorkrData/tree/master/Twenty20/T20-matches). The T20 bowling details of the team in each match is created and a huge data frame is created by rbinding the individual dataframes. This can be saved as a RData file

setwd("C:/software/cricket-package/york-test/yorkrData/Twenty20/T20-matches")
ind_bowling <- getTeamBowlingDetails("India",dir=".",save=TRUE)
dim(ind_bowling)
## [1] 872  12
aus_bowling <- getTeamBowlingDetails("Australia",dir=".",save=TRUE)
dim(aus_bowling)
## [1] 1364   12
eng_bowling <- getTeamBowlingDetails("England",dir=".",save=TRUE)
dim(eng_bowling)
## [1] 1183   12
sa_bowling <- getTeamBowlingDetails("South Africa",dir=".",save=TRUE)
dim(sa_bowling)
## [1] 995  12
pak_bowling <- getTeamBowlingDetails("Pakistan",dir=".",save=TRUE)
dim(pak_bowling)
## [1] 1186   12
nz_bowling <- getTeamBowlingDetails("New Zealand",dir=".",save=TRUE)
dim(nz_bowling)
## [1] 1295   12

14. Get bowling details of the individual bowlers

This function is used to get the individual bowling record for a specified bowler of the country as in the functions below. For analyzing the bowling performances the following cricketers have been chosen

  1. Ravichander Ashwin (Ind)
  2. SR Watson (Aus)
  3. SCJ Broad (Eng)
  4. Saeed Ajmal (Pak)
  5. Dale Steyn (SA)
  6. NL McCullum (NZ)
ashwin <- getBowlerWicketDetails(team="India",name="Ashwin",dir=".")
watson <-  getBowlerWicketDetails(team="Australia",name="SR Watson",dir=".")
broad <-  getBowlerWicketDetails(team="England",name="SCJ Broad",dir=".")
ajmal <-  getBowlerWicketDetails(team="Pakistan",name="Saeed Ajmal",dir=".")
steyn <-  getBowlerWicketDetails(team="South Africa",name="Steyn",dir=".")
nmccullum <-  getBowlerWicketDetails(team="New Zealand",name="NL McCullum",dir=".")

15. Bowler Mean Economy Rate

Ashwin has a mean economy rate of 5.0 for 3 & 4 overs. Saeed Ajmal is more expensive

p1<-bowlerMeanEconomyRate(ashwin,"R Ashwin")
p2<-bowlerMeanEconomyRate(watson, "SR Watson")
p3<-bowlerMeanEconomyRate(broad, "SCJ Broad")
p4<-bowlerMeanEconomyRate(ajmal, "Saeed Ajmal")
p5<-bowlerMeanEconomyRate(steyn, "D Steyn")
p6<-bowlerMeanEconomyRate(nmccullum, "NL Mccullum")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

meanER-1

16. Bowler Mean Runs conceded

p1<-bowlerMeanRunsConceded(ashwin,"R Ashwin")
p2<-bowlerMeanRunsConceded(watson, "SR Watson")
p3<-bowlerMeanRunsConceded(broad, "SCJ Broad")
p4<-bowlerMeanRunsConceded(ajmal, "Saeed Ajmal")
p5<-bowlerMeanRunsConceded(steyn, "D Steyn")
p6<-bowlerMeanRunsConceded(nmccullum, "NL Mccullum")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

meanRunsConceded-1

17. Bowler Moving average

Aswin, SCJ Broad and Steyn have an improving performance in T20s. NL McCullum has a drop and Ajmal’s performance is on the decline

p1<-bowlerMovingAverage(ashwin,"R Ashwin")
p2<-bowlerMovingAverage(watson, "SR Watson")
p3<-bowlerMovingAverage(broad, "SCJ Broad")
p4<-bowlerMovingAverage(ajmal, "Saeed Ajmal")
p5<-bowlerMovingAverage(steyn, "D Steyn")
p6<-bowlerMovingAverage(nmccullum, "NL Mccullum")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

bowlerMA-1

17. Bowler cumulative average wickets

Interestingly Ajmal and NL McCullum have a cumulative average wickets of around 2.0. Steyn also has a cumulative average of 2.0+

p1<-bowlerCumulativeAvgWickets(ashwin,"R Ashwin")
p2<-bowlerCumulativeAvgWickets(watson, "SR Watson")
p3<-bowlerCumulativeAvgWickets(broad, "SCJ Broad")
p4<-bowlerCumulativeAvgWickets(ajmal, "Saeed Ajmal")
p5<-bowlerCumulativeAvgWickets(steyn, "D Steyn")
p6<-bowlerCumulativeAvgWickets(nmccullum, "NL Mccullum")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

cumWkts-1

18. Bowler cumulative Economy Rate (ER)

Ajmal’s economy rate deteriorates from a excellent rate of 5.5, while Ashwin’s economy rate improves from a terrible rate of 9.0+.

p1<-bowlerCumulativeAvgEconRate(ashwin,"R Ashwin")
p2<-bowlerCumulativeAvgEconRate(watson, "SR Watson")
p3<-bowlerCumulativeAvgEconRate(broad, "SCJ Broad")
p4<-bowlerCumulativeAvgEconRate(ajmal, "Saeed Ajmal")
p5<-bowlerCumulativeAvgEconRate(steyn, "D Steyn")
p6<-bowlerCumulativeAvgEconRate(nmccullum, "NL Mccullum")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

cumER-1

19. Bowler wicket plot

The plot below gives the average wickets versus number of overs

p1<-bowlerWicketPlot(ashwin,"R Ashwin")
p2<-bowlerWicketPlot(watson, "SR Watson")
p3<-bowlerWicketPlot(broad, "SCJ Broad")
p4<-bowlerWicketPlot(ajmal, "Saeed Ajmal")
p5<-bowlerWicketPlot(steyn, "D Steyn")
p6<-bowlerWicketPlot(nmccullum, "NL Mccullum")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

wktPlot-1

20. Bowler wicket against opposition

#Ashwin's best pertformance are against South Africa,Sri Lanka, Bangaldesh and Afghanistan
bowlerWicketsAgainstOpposition(ashwin,"R Ashwin")

wktsOppn1-1

#Watson's bets pertformance are against England, Ireland and New Zealand
bowlerWicketsAgainstOpposition(watson, "SR Watson")

wktsOppn2-1

bowlerWicketsAgainstOpposition(broad, "SCJ Broad")

wktsOppn3-1

#Ajmal's best performances are against Sri Lanka, New Zealand and South Africa
bowlerWicketsAgainstOpposition(ajmal, "Saeed Ajmal")

wktsOppn4-1

#Steyn has good performances against New Zealand, Sri Lanka, Pakistan, West Indies
bowlerWicketsAgainstOpposition(steyn, "D Steyn")

wktsOppn5-1

bowlerWicketsAgainstOpposition(nmccullum, "NL Mccullum")

wktsOppn6-1

21. Bowler wicket at cricket grounds

bowlerWicketsVenue(ashwin,"R Ashwin")

wktsAve1-1

bowlerWicketsVenue(watson, "SR Watson")

wktsAve2-1

bowlerWicketsVenue(broad, "SCJ Broad")

wktsAve3-1

bowlerWicketsVenue(ajmal, "Saeed Ajmal")

wktsAve4-1

bowlerWicketsVenue(steyn, "D Steyn")

wktsAve5-1

bowlerWicketsVenue(nmccullum, "NL Mccullum")

wktsAve6-1

22. Get Delivery wickets for bowlers

This function creates a dataframe of deliveries and the wickets taken

setwd("C:/software/cricket-package/york-test/yorkrData/Twenty20/T20-matches")
ashwin1 <- getDeliveryWickets(team="India",dir=".",name="Ashwin",save=FALSE)
watson1 <- getDeliveryWickets(team="Australia",dir=".",name="SR Watson",save=FALSE)
broad1 <- getDeliveryWickets(team="England",dir=".",name="SCJ Broad",save=FALSE)
ajmal1 <- getDeliveryWickets(team="Pakistan",dir=".",name="Saeed Ajmal",save=FALSE)
steyn1 <- getDeliveryWickets(team="South Africa",dir=".",name="Steyn",save=FALSE)
nmccullum1 <- getDeliveryWickets(team="New Zealand",dir=".",name="NL McCullum",save=FALSE)

23. Predict number of deliveries to wickets

#Ashwin takes 
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktsPredict(ashwin1,"R Ashwin")
bowlerWktsPredict(watson1,"SR Watson")

wktsPred1-1

#Broad and Ajmal need around 8 deliveries for a wicket
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktsPredict(broad1,"SCJ Broad")
bowlerWktsPredict(ajmal1,"Saeed Ajmal")

wktsPred2-1

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktsPredict(steyn1,"D Steyn")
bowlerWktsPredict(nmccullum1,"NL Mccullum")

wktsPred3-1

yorkr pads up for the Twenty20s:Part 3:Overall team performance against all oppositions!


Introduction

“So in war, the way is to avoid what is strong, and strike at what is weak.”

“Thus the expert in battle moves the enemy, and is not moved by him.”

“Appear weak when you are strong, and strong when you are weak.”

                                         The Art of War - Sun Tzu

This post is a continuation of my 2 earlier posts based on the enhancement of my R package yorkr to includ functions to handle Twenty20 matches. This is the 3rd part of the Twenty20 based functions, the 2 earlier ones were

  1. yorkr pads up for the Twenty20s: Part 1- Analyzing team“s match performance.
  2. yorkr pads up for the Twenty20s: Part 2-Head to head confrontation between teams

This post deals with Class 3 functions, namely the performances of a team in all T20 matches against all oppositions for e.g India/Australia/South Africa against all oppositions in all matches. In other words it is the performance of the team against the rest of the world.

This post has also been published at RPubs [yorkrT20-Part3]http://rpubs.com/tvganesh/yorkrT20-Part3) and can also be downloaded as a PDF document from yorkrT20-Part3.pdf.

You can clone/fork the code for the package yorkr from Github at yorkr-package

Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today!

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

Note: To do similar analysis you can use my yorkrT20templates. See my post Analysis of International T20 matches with yorkr templates

The list of functions in Class 3 are

  1. teamBattingScorecardAllOppnAllMatches()
  2. teamBatsmenPartnershipAllOppnAllMatches()
  3. teamBatsmenPartnershipAllOppnAllMatchesPlot()
  4. teamBatsmenVsBowlersAllOppnAllMatchesRept()
  5. teamBatsmenVsBowlersAllOppnAllMatchesPlot()
  6. teamBowlingScorecardAllOppnAllMatchesMain()
  7. teamBowlersVsBatsmenAllOppnAllMatchesRept()
  8. teamBowlersVsBatsmenAllOppnAllMatchesPlot()
  9. teamBowlingWicketKindAllOppnAllMatches()
  10. teamBowlingWicketRunsAllOppnAllMatches()

Note 1: The yorkr package in its current avatar only supports ODI & Twenty20 matches. I will be upgrading the package to handle IPL in the months to come.

Note 2: As in the previous parts the plots usually have the plot=TRUE/FALSE parameter. This is to allow the user to get a return value of the desired dataframe. The user can choose to plot this, in any way he/she likes for e.g in interactive charts using rcharts, ggvis,googleVis,plotly etc

1. Install the package from CRAN

The yorkr package can be installed directly from CRAN now! Install the yorkr package.

if (!require("yorkr")) {
  install.packages("yorkr") 
  library("yorkr")
}
rm(list=ls())

2. Get data for all matches against all oppositions for a team

We can get all matches against all oppositions for a team/country using the function below. The dir parameter should point to the folder in which the RData files where the individual T20 matches exist. This function creates a data frame of all the matches and also saves the resulting dataframe as RData

setwd("C:/software/cricket-package/york-test/yorkrData/Twenty20/T20-team-allmatches-allOppositions")

# Get all matches against all oppositions for India and save as RData
matches <-getAllMatchesAllOpposition("India",dir=".",save=TRUE)
dim(matches)
## [1] 14380    25

“`

3. Save data for all matches against all oppositions

This can be done locally using the function below. This function gets all the T20 matches of the country/team against all other countrioes//teams and combines them into a single dataframe and saves it in the current folder. The current implementation expects that the the RData files of individual matches are in ../data folder. Since I already have converted this I will not be running this again

#saveAllMatchesAllOpposition()

4. Load data directly for all matches between 2 teams

As in my earlier posts (yorkr-Part1 & yorkr-Part2) I have however already saved the data, for all matches of the individual countries, against all oppositons. The data for these matches for the individual teams/countries can be downloaded directly from Github folder at T20-team-allmatches-allOppositions

Note: The dataframe for the different for all the matches of a country against all oppositons can be loaded directly into your code.Feel free to download the zip of the data and to perform any data mining on them.

If you do come up with interesting insights, I would appreciate if attribute the source to Cricsheet(http://cricsheet.org), and my package yorkr and my blog Giga thoughts, besides dropping me a note.*

As in my earlier post I will be directly loading the saved files. For the illustration of the functions, I will use India in all the functions, (for obvious reasons) and will randomly use the data from the rest of the top 8 teams

setwd("C:/software/cricket-package/york-test/yorkrData/Twenty20/T20-team-allmatches-allOppositions")
load("allMatchesAllOpposition-India.RData")
ind_matches <- matches
load("allMatchesAllOpposition-Australia.RData")
aus_matches <- matches
load("allMatchesAllOpposition-New Zealand.RData")
nz_matches <- matches
load("allMatchesAllOpposition-Pakistan.RData")
pak_matches <- matches
load("allMatchesAllOpposition-England.RData")
eng_matches <- matches
load("allMatchesAllOpposition-Sri Lanka.RData")
sl_matches <- matches
load("allMatchesAllOpposition-West Indies.RData")
wi_matches <- matches
load("allMatchesAllOpposition-South Africa.RData")
sa_matches <- matches

5. Team T20 Batting Scorecard (all matches with opposition)

The following functions shows the batting scorecards in each country. It returns a dataframe with the top batsmen in each country

#Top Twenty20 performers for India
m <-teamBattingScorecardAllOppnAllMatches(ind_matches,theTeam="India")
## Total= 8663
m
## Source: local data frame [46 x 5]
## 
##         batsman ballsPlayed fours sixes  runs
##          (fctr)       (int) (int) (int) (dbl)
## 1       V Kohli         882   124    27  1215
## 2      SK Raina         806   102    39  1114
## 3     RG Sharma         800    91    37  1053
## 4  Yuvraj Singh         656    59    60   933
## 5     G Gambhir         739   110    10   911
## 6      MS Dhoni         723    60    24   864
## 7      V Sehwag         228    36    13   330
## 8     AM Rahane         256    28     6   302
## 9    RV Uthappa         204    26     6   249
## 10     S Dhawan         193    28     8   248
## ..          ...         ...   ...   ...   ...
#Top Twenty20 batsmen for Australia
m <-teamBattingScorecardAllOppnAllMatches(aus_matches,theTeam="Australia")
## Total= 11743
m
## Source: local data frame [70 x 5]
## 
##       batsman ballsPlayed fours sixes  runs
##        (fctr)       (int) (int) (int) (dbl)
## 1   DA Warner        1030   139    67  1465
## 2   SR Watson         888   103    76  1315
## 3    CL White         726    71    44   984
## 4    AJ Finch         566    95    36   874
## 5   DJ Hussey         615    41    34   756
## 6  MEK Hussey         518    58    25   721
## 7   MJ Clarke         467    29    10   488
## 8   GJ Bailey         331    37    20   470
## 9   BJ Haddin         342    30    13   402
## 10 RT Ponting         294    42    11   401
## ..        ...         ...   ...   ...   ...
#Top Twenty20 batsmen for Pakistan
m <-teamBattingScorecardAllOppnAllMatches(pak_matches,theTeam="Pakistan")
## Total= 12943
m
## Source: local data frame [58 x 5]
## 
##            batsman ballsPlayed fours sixes  runs
##             (fctr)       (int) (int) (int) (dbl)
## 1       Umar Akmal        1184   112    48  1506
## 2  Mohammad Hafeez        1254   156    36  1466
## 3    Shahid Afridi         812    89    63  1255
## 4     Shoaib Malik        1068   101    23  1206
## 5    Ahmed Shehzad         799   102    24   941
## 6     Kamran Akmal         705    88    27   871
## 7    Misbah-ul-Haq         685    46    26   770
## 8      Imran Nazir         338    55    21   468
## 9      Salman Butt         410    58     6   467
## 10     Younis Khan         331    32    12   427
## ..             ...         ...   ...   ...   ...
#Top Twenty20 batsmen for New Zealand
m <-teamBattingScorecardAllOppnAllMatches(nz_matches,theTeam="New Zealand")
## Total= 11656
m
## Source: local data frame [62 x 5]
## 
##          batsman ballsPlayed fours sixes  runs
##           (fctr)       (int) (int) (int) (dbl)
## 1    BB McCullum        1501   199    89  2106
## 2     MJ Guptill        1259   155    68  1665
## 3    LRPL Taylor         900    76    45  1126
## 4  KS Williamson         650   101    10   844
## 5      SB Styris         443    38    25   554
## 6   JEC Franklin         371    28    23   463
## 7       JDP Oram         318    37    20   451
## 8       JD Ryder         336    45    16   434
## 9        C Munro         227    23    27   377
## 10      RJ Nicol         283    33    11   327
## ..           ...         ...   ...   ...   ...
#Top Twenty20 batsmen for England
m <-teamBattingScorecardAllOppnAllMatches(eng_matches,theTeam="England")
## Total= 11215
m
## Source: local data frame [65 x 5]
## 
##           batsman ballsPlayed fours sixes  runs
##            (fctr)       (int) (int) (int) (dbl)
## 1      EJG Morgan         938   108    53  1285
## 2    KP Pietersen         810   119    32  1176
## 3        AD Hales         790   116    35  1111
## 4       LJ Wright         540    68    31   759
## 5       RS Bopara         588    54    17   711
## 6  PD Collingwood         451    38    24   583
## 7      JC Buttler         410    43    23   562
## 8         MJ Lumb         394    64    21   552
## 9    C Kieswetter         456    47    23   526
## 10        OA Shah         276    26    13   347
## ..            ...         ...   ...   ...   ...
#Top Twenty20 batsmen for West Indies
m <-teamBattingScorecardAllOppnAllMatches(wi_matches,theTeam="West Indies")
## Total= 9292
m
## Source: local data frame [54 x 5]
## 
##          batsman ballsPlayed fours sixes  runs
##           (fctr)       (int) (int) (int) (dbl)
## 1       CH Gayle         958   122    87  1406
## 2       DJ Bravo         792    57    43   979
## 3     MN Samuels         764    76    46   952
## 4    LMP Simmons         633    68    25   732
## 5     KA Pollard         453    50    36   664
## 6       DR Smith         463    62    31   582
## 7      DJG Sammy         330    43    27   526
## 8      J Charles         382    59    12   456
## 9   ADS Fletcher         341    26    19   387
## 10 S Chanderpaul         337    34     5   343
## ..           ...         ...   ...   ...   ...
#Top Twenty20 batsmen for Sri Lanka
m <-teamBattingScorecardAllOppnAllMatches(sl_matches,theTeam="Sri Lanka")
## Total= 9572
m
## Source: local data frame [54 x 5]
## 
##             batsman ballsPlayed fours sixes  runs
##              (fctr)       (int) (int) (int) (dbl)
## 1        TM Dilshan        1235   186    26  1556
## 2  DPMD Jayawardene        1005   157    28  1346
## 3     KC Sangakkara        1088   132    18  1320
## 4        AD Mathews         640    54    24   794
## 5       MDKJ Perera         435    60    23   596
## 6     ST Jayasuriya         428    70    21   581
## 7       NLTC Perera         310    38    22   480
## 8     CK Kapugedera         329    36    16   417
## 9      LD Chandimal         371    31     7   380
## 10  HDRL Thirimanne         240    26     5   277
## ..              ...         ...   ...   ...   ...

6. Team Batting Scorecard in Twenty20 matches against all oppositions

The following functions show the best batsmen from the opposition ‘theTeam’ in the ‘matches’. For e.g. when the matches=ind_matches and theTeam=“England” then the returned dataframe shows the best English batsmen against India

#Top T20 England batsmen against India
m <-teamBattingScorecardAllOppnAllMatches(matches=ind_matches,theTeam="England")
## Total= 1169
m
## Source: local data frame [26 x 5]
## 
##         batsman ballsPlayed fours sixes  runs
##          (fctr)       (int) (int) (int) (dbl)
## 1    EJG Morgan          96    15    10   176
## 2  KP Pietersen         107    18     5   171
## 3      AD Hales         112    14     6   149
## 4     RS Bopara          94     9     2   103
## 5      SR Patel          73     6     2    79
## 6    JC Buttler          54     2     4    69
## 7  C Kieswetter          47     8     3    65
## 8     LJ Wright          50     4     3    62
## 9       MJ Lumb          39     6     2    51
## 10   VS Solanki          30     5     1    43
## ..          ...         ...   ...   ...   ...
#Top T20 Australian batsmen against India
m <-teamBattingScorecardAllOppnAllMatches(matches=ind_matches,theTeam="Australia")
## Total= 1767
m
## Source: local data frame [40 x 5]
## 
##         batsman ballsPlayed fours sixes  runs
##          (fctr)       (int) (int) (int) (dbl)
## 1     SR Watson         173    16    20   284
## 2      AJ Finch         164    33     5   249
## 3     DA Warner         134    14    14   204
## 4       MS Wade          93     6     5   125
## 5     DJ Hussey          81     5     6   101
## 6     ML Hayden          63     5     6    79
## 7    RT Ponting          52    13    NA    76
## 8     MJ Clarke          54     3     1    65
## 9     A Symonds          43     4     2    63
## 10 AC Gilchrist          38     7     3    59
## ..          ...         ...   ...   ...   ...
#Top T20 New Zealand batsmen against Australia
m <-teamBattingScorecardAllOppnAllMatches(aus_matches,theTeam="New Zealand")
## Total= 727
m
## Source: local data frame [27 x 5]
## 
##         batsman ballsPlayed fours sixes  runs
##          (fctr)       (int) (int) (int) (dbl)
## 1   BB McCullum         138    22    12   228
## 2     SB Styris          54     9     3    84
## 3      JDP Oram          34     5     6    67
## 4    GJ Hopkins          30     5     3    57
## 5  JEC Franklin          52     3     2    53
## 6    MJ Guptill          47     7    NA    47
## 7      NT Broom          26     6    NA    36
## 8   NL McCullum          13     2     2    25
## 9    GD Elliott          26     2    NA    23
## 10   SP Fleming          13     3    NA    18
## ..          ...         ...   ...   ...   ...
#Top T20 Sri Lankan batsmen against West Indies
m <-teamBattingScorecardAllOppnAllMatches(wi_matches,theTeam="Sri Lanka")
## Total= 1225
m
## Source: local data frame [21 x 5]
## 
##             batsman ballsPlayed fours sixes  runs
##              (fctr)       (int) (int) (int) (dbl)
## 1        TM Dilshan         224    42     4   334
## 2  DPMD Jayawardene         149    21     5   202
## 3     KC Sangakkara         119    12     3   135
## 4     ST Jayasuriya          91    14     3   111
## 5        AD Mathews          52     6     7    98
## 6       MDKJ Perera          52    10     3    78
## 7  DSNFG Jayasuriya          52     5     3    66
## 8   HDRL Thirimanne          42     3     2    48
## 9      LD Chandimal          20     4     2    41
## 10  KMDN Kulasekara          18     3     1    30
## ..              ...         ...   ...   ...   ...

7. Team Batting Partnerships in Twenty20 matches against all oppositions

This gives the top batting partnerships in each team in all its matches against all oppositions. The report can either be a ‘summary’ or a ‘detailed’ breakup of the batting partnerships.

# The function gives the names of highest T20 partnership for India. The default report parameter is "summary"
m <- teamBatsmenPartnershipAllOppnAllMatches(ind_matches,theTeam='India')
m
## Source: local data frame [46 x 2]
## 
##         batsman totalRuns
##          (fctr)     (dbl)
## 1       V Kohli      1215
## 2      SK Raina      1114
## 3     RG Sharma      1053
## 4  Yuvraj Singh       933
## 5     G Gambhir       911
## 6      MS Dhoni       864
## 7      V Sehwag       330
## 8     AM Rahane       302
## 9    RV Uthappa       249
## 10     S Dhawan       248
## ..          ...       ...
# When the report parameter is 'detailed' then the detailed break up of the T20 partnership is returned as a data frame
m <- teamBatsmenPartnershipAllOppnAllMatches(matches,theTeam='India',report="detailed")
head(m,30)
##      batsman      nonStriker partnershipRuns totalRuns
## 1  RG Sharma       G Gambhir              26       309
## 2  RG Sharma        SK Raina              25       309
## 3  RG Sharma    Yuvraj Singh              41       309
## 4  RG Sharma        MS Dhoni              31       309
## 5  RG Sharma        V Sehwag               0       309
## 6  RG Sharma         V Kohli             110       309
## 7  RG Sharma       AM Rahane              24       309
## 8  RG Sharma        S Dhawan              33       309
## 9  RG Sharma      RV Uthappa              13       309
## 10 RG Sharma       IK Pathan               6       309
## 11  SK Raina       RG Sharma              37       250
## 12  SK Raina    Yuvraj Singh              50       250
## 13  SK Raina        MS Dhoni              73       250
## 14  SK Raina       YK Pathan              51       250
## 15  SK Raina      KD Karthik              16       250
## 16  SK Raina         V Kohli              21       250
## 17  SK Raina        AR Patel               0       250
## 18  SK Raina       AT Rayudu               2       250
## 19   V Kohli       RG Sharma              70       146
## 20   V Kohli        SK Raina              11       146
## 21   V Kohli    Yuvraj Singh              37       146
## 22   V Kohli        MS Dhoni               9       146
## 23   V Kohli         M Vijay               2       146
## 24   V Kohli        V Sehwag               2       146
## 25   V Kohli       AM Rahane              15       146
## 26  MS Dhoni       RG Sharma              45       124
## 27  MS Dhoni        SK Raina              53       124
## 28  MS Dhoni    Yuvraj Singh               5       124
## 29  MS Dhoni Harbhajan Singh               8       124
## 30  MS Dhoni         V Kohli               0       124

9. More Team Batting Partnerships in Twenty20 matches against all oppositions

When we use the dataframe ind_matches (matches of India against all opoositions) and choose another country in the theTeam then we will get the names of those top batsmen against India.

# Top T20 England batting partnerships against India (report="summary")
m <- teamBatsmenPartnershipAllOppnAllMatches(ind_matches,theTeam='England')
m
## Source: local data frame [26 x 2]
## 
##         batsman totalRuns
##          (fctr)     (dbl)
## 1    EJG Morgan       176
## 2  KP Pietersen       171
## 3      AD Hales       149
## 4     RS Bopara       103
## 5      SR Patel        79
## 6    JC Buttler        69
## 7  C Kieswetter        65
## 8     LJ Wright        62
## 9       MJ Lumb        51
## 10   VS Solanki        43
## ..          ...       ...
# Top T20 South Africa  batting partnerships against India (report="detailed")
m <- teamBatsmenPartnershipAllOppnAllMatches(ind_matches,theTeam='South Africa', report="detailed")
m[1:30,]
##           batsman      nonStriker partnershipRuns totalRuns
## 1  AB de Villiers        GC Smith              28       208
## 2  AB de Villiers       JP Duminy              40       208
## 3  AB de Villiers      MV Boucher              19       208
## 4  AB de Villiers       JA Morkel              11       208
## 5  AB de Villiers       JH Kallis              17       208
## 6  AB de Villiers    F du Plessis              24       208
## 7  AB de Villiers         HM Amla              49       208
## 8  AB de Villiers         JM Kemp               6       208
## 9  AB de Villiers      MN van Wyk              14       208
## 10      JP Duminy  AB de Villiers              20       173
## 11      JP Duminy      MV Boucher               4       173
## 12      JP Duminy    F du Plessis              33       173
## 13      JP Duminy     F Behardien              88       173
## 14      JP Duminy       DA Miller              28       173
## 15      JP Duminy      MN van Wyk               0       173
## 16   F du Plessis  AB de Villiers              45       143
## 17   F du Plessis       JP Duminy              86       143
## 18   F du Plessis         HM Amla              12       143
## 19      JH Kallis        GC Smith              59       140
## 20      JH Kallis  AB de Villiers               8       140
## 21      JH Kallis       LE Bosman              12       140
## 22      JH Kallis       CA Ingram              59       140
## 23      JH Kallis         RE Levi               2       140
## 24      JA Morkel  AB de Villiers              12       109
## 25      JA Morkel      MV Boucher              34       109
## 26      JA Morkel         J Botha               8       109
## 27      JA Morkel     F Behardien              16       109
## 28      JA Morkel        DW Steyn               7       109
## 29      JA Morkel         JM Kemp               3       109
## 30      JA Morkel JJ van der Wath              28       109

10. Team Batting partnerships of other countries in Twenty20 matches against all oppositions

#Top Indian T20 batting partnerships  against England matches
m <- teamBatsmenPartnershipAllOppnAllMatches(eng_matches,theTeam='India',report="detailed")
head(m,30)
##      batsman   nonStriker partnershipRuns totalRuns
## 1    V Kohli    G Gambhir              78       184
## 2    V Kohli   RV Uthappa               4       184
## 3    V Kohli Yuvraj Singh              10       184
## 4    V Kohli    RG Sharma               2       184
## 5    V Kohli     SK Raina              42       184
## 6    V Kohli    AM Rahane               3       184
## 7    V Kohli     S Dhawan              45       184
## 8   MS Dhoni   RV Uthappa               5       167
## 9   MS Dhoni Yuvraj Singh               4       167
## 10  MS Dhoni    IK Pathan               2       167
## 11  MS Dhoni    RG Sharma               9       167
## 12  MS Dhoni     SK Raina              71       167
## 13  MS Dhoni    RA Jadeja              11       167
## 14  MS Dhoni    YK Pathan              36       167
## 15  MS Dhoni    AT Rayudu              18       167
## 16  MS Dhoni     R Ashwin              11       167
## 17 G Gambhir     V Sehwag              51       162
## 18 G Gambhir   RV Uthappa               7       162
## 19 G Gambhir Yuvraj Singh               2       162
## 20 G Gambhir    IK Pathan              11       162
## 21 G Gambhir    RG Sharma              25       162
## 22 G Gambhir     SK Raina              10       162
## 23 G Gambhir    RA Jadeja              15       162
## 24 G Gambhir    AM Rahane              18       162
## 25 G Gambhir      V Kohli              23       162
## 26  SK Raina    G Gambhir               2       161
## 27  SK Raina     MS Dhoni              80       161
## 28  SK Raina    RG Sharma              16       161
## 29  SK Raina      V Kohli              34       161
## 30  SK Raina      P Kumar               0       161
#Top South Africa T20 batting partnerships 
m <- teamBatsmenPartnershipAllOppnAllMatches(sa_matches,theTeam='South Africa', report="detailed")
head(m,30)
##           batsman       nonStriker partnershipRuns totalRuns
## 1       JP Duminy        LE Bosman               3      1528
## 2       JP Duminy         GC Smith              78      1528
## 3       JP Duminy        JH Kallis              77      1528
## 4       JP Duminy   AB de Villiers             207      1528
## 5       JP Duminy       MV Boucher              93      1528
## 6       JP Duminy        JA Morkel             143      1528
## 7       JP Duminy RE van der Merwe              12      1528
## 8       JP Duminy         DW Steyn              10      1528
## 9       JP Duminy          J Botha              34      1528
## 10      JP Duminy VB van Jaarsveld              50      1528
## 11      JP Duminy         HH Gibbs               0      1528
## 12      JP Duminy          HM Amla             104      1528
## 13      JP Duminy      ND McKenzie              20      1528
## 14      JP Duminy      F Behardien             117      1528
## 15      JP Duminy      RJ Peterson              32      1528
## 16      JP Duminy        Q de Kock              13      1528
## 17      JP Duminy        DA Miller             284      1528
## 18      JP Duminy     RR Hendricks              15      1528
## 19      JP Duminy        R McLaren               1      1528
## 20      JP Duminy       WD Parnell               5      1528
## 21      JP Duminy          D Wiese               6      1528
## 22      JP Duminy     F du Plessis             119      1528
## 23      JP Duminy        JL Ontong              17      1528
## 24      JP Duminy        CA Ingram              67      1528
## 25      JP Duminy          HG Kuhn               6      1528
## 26      JP Duminy       MN van Wyk               7      1528
## 27      JP Duminy      AN Petersen               8      1528
## 28 AB de Villiers         GC Smith              72      1167
## 29 AB de Villiers        JH Kallis              81      1167
## 30 AB de Villiers        JP Duminy             263      1167
#Top Sri Lanka T20 batting partnerships 
m <- teamBatsmenPartnershipAllOppnAllMatches(sl_matches,theTeam='Sri Lanka',report="summary")
m
## Source: local data frame [54 x 2]
## 
##             batsman totalRuns
##              (fctr)     (dbl)
## 1        TM Dilshan      1556
## 2  DPMD Jayawardene      1346
## 3     KC Sangakkara      1320
## 4        AD Mathews       794
## 5       MDKJ Perera       596
## 6     ST Jayasuriya       581
## 7       NLTC Perera       480
## 8     CK Kapugedera       417
## 9      LD Chandimal       380
## 10  HDRL Thirimanne       277
## ..              ...       ...
#Top England T20 batting partnerships 
m <- teamBatsmenPartnershipAllOppnAllMatches(eng_matches,theTeam='England',report="summary")
m
## Source: local data frame [65 x 2]
## 
##           batsman totalRuns
##            (fctr)     (dbl)
## 1      EJG Morgan      1285
## 2    KP Pietersen      1176
## 3        AD Hales      1111
## 4       LJ Wright       759
## 5       RS Bopara       711
## 6  PD Collingwood       583
## 7      JC Buttler       562
## 8         MJ Lumb       552
## 9    C Kieswetter       526
## 10        OA Shah       347
## ..            ...       ...
#Top Australian T20 batting partnerships in West Indian matches
m <- teamBatsmenPartnershipAllOppnAllMatches(wi_matches,theTeam='Australia',report="summary")
m
## Source: local data frame [31 x 2]
## 
##       batsman totalRuns
##        (fctr)     (dbl)
## 1   DA Warner       311
## 2   SR Watson       240
## 3  MEK Hussey       147
## 4   BJ Haddin       146
## 5   GJ Bailey       135
## 6   DJ Hussey        57
## 7    AC Voges        51
## 8    SE Marsh        50
## 9  GJ Maxwell        45
## 10   L Ronchi        36
## ..        ...       ...
#Top England T20 batting partnerships in New Zealand  matches
m <- teamBatsmenPartnershipAllOppnAllMatches(nz_matches,theTeam='England',report="summary")
m
## Source: local data frame [35 x 2]
## 
##           batsman totalRuns
##            (fctr)     (dbl)
## 1       LJ Wright       273
## 2        AD Hales       194
## 3         MJ Lumb       188
## 4      EJG Morgan       152
## 5      JC Buttler       140
## 6    KP Pietersen       112
## 7         OA Shah        91
## 8  PD Collingwood        86
## 9         IR Bell        73
## 10        JE Root        68
## ..            ...       ...

11. Team Batting Partnership plots in Twenty20 matches against all oppositions

Graphical plot of batting partnerships for the countries

# Plot of T20 batting partnerships of India (Virat Kohli and Suresh Raina have the best T20 partnerships)
teamBatsmenPartnershipAllOppnAllMatchesPlot(ind_matches,"India",main="India")

batsmenPartnership1-1

# Plot of T20 batting partnerships of Pakistan (Umar Akmal and Mohammed Hafeez lead)
teamBatsmenPartnershipAllOppnAllMatchesPlot(pak_matches,"Pakistan",main="Pakistan")

batsmenPartnership1-2

# Plot of T20 batting partnerships of Australia (David Warner and Shane Watson head the list)
teamBatsmenPartnershipAllOppnAllMatchesPlot(aus_matches,"Australia",main="Australia")

batsmenPartnership1-3

12. Top opposition batting partnerships in Twenty20 matches against all oppositions

This gives the best performance of the team against a specified country

# Top India T20 partnerships against West Indies
teamBatsmenPartnershipAllOppnAllMatchesPlot(ind_matches,"India",main="West Indies")

batsmenPartnership2-1

# Top Sri Lanka T20 partnerships against India
teamBatsmenPartnershipAllOppnAllMatchesPlot(sl_matches,"Sri Lanka",main="India")

batsmenPartnership2-2

# Top New Zealand T20 partnerships against South Africa
teamBatsmenPartnershipAllOppnAllMatchesPlot(nz_matches,"New Zealand",main="South Africa")

batsmenPartnership2-3

13. Batsmen vs Bowlers in Twenty20 matches against all oppositions

The function below gives the top performance of batsmen against the opposition countries

# Top T20 batsmen against bowlers when rank=0
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(ind_matches,"India",rank=0)
m
## Source: local data frame [46 x 2]
## 
##         batsman runsScored
##          (fctr)      (dbl)
## 1       V Kohli       1215
## 2      SK Raina       1114
## 3     RG Sharma       1053
## 4  Yuvraj Singh        933
## 5     G Gambhir        911
## 6      MS Dhoni        864
## 7      V Sehwag        330
## 8     AM Rahane        302
## 9    RV Uthappa        249
## 10     S Dhawan        248
## ..          ...        ...
# Performance of India batsman in T20 with rank=1 against international bowlers and runs scored against bowlers. This is Virat Kohli for India
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(ind_matches,"India",rank=1,dispRows=30)
m
## Source: local data frame [30 x 3]
## Groups: batsman [1]
## 
##    batsman        bowler  runs
##     (fctr)        (fctr) (dbl)
## 1  V Kohli Shahid Afridi    43
## 2  V Kohli     SR Watson    39
## 3  V Kohli   Imran Tahir    34
## 4  V Kohli      CJ Boyce    32
## 5  V Kohli   Saeed Ajmal    32
## 6  V Kohli        AJ Tye    31
## 7  V Kohli  HMRKB Herath    29
## 8  V Kohli    TT Bresnan    28
## 9  V Kohli KW Richardson    27
## 10 V Kohli     SM Boland    27
## ..     ...           ...   ...
# Performance of India batsman in T20 with rank=2 against international bowlers and runs scored against these bowlers. This is Suresh Raina for India
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(ind_matches,"India",rank=2,dispRows=50)
m
## Source: local data frame [50 x 3]
## Groups: batsman [1]
## 
##     batsman        bowler  runs
##      (fctr)        (fctr) (dbl)
## 1  SK Raina RK Kleinveldt    33
## 2  SK Raina     JH Kallis    31
## 3  SK Raina    T Thushara    31
## 4  SK Raina        AJ Tye    29
## 5  SK Raina    TT Bresnan    29
## 6  SK Raina     SR Watson    26
## 7  SK Raina   JC Tredwell    26
## 8  SK Raina    IE O'Brien    26
## 9  SK Raina Mohammad Nabi    22
## 10 SK Raina      GP Swann    22
## ..      ...           ...   ...
# Performance of England batsman in T20 with rank=1 against international bowlers and runs scored against these bowlers. This returns a data frame of the the theTeam's batsmen against the bowlers for which the 'matches' dataframe is used. This Is EJG Morgan of England,
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(matches=ind_matches,theTeam="England",rank=1,dispRows=25)
m
## Source: local data frame [15 x 3]
## Groups: batsman [1]
## 
##       batsman          bowler  runs
##        (fctr)          (fctr) (dbl)
## 1  EJG Morgan        R Ashwin    24
## 2  EJG Morgan        AB Dinda    22
## 3  EJG Morgan       KV Sharma    18
## 4  EJG Morgan       RA Jadeja    17
## 5  EJG Morgan       MM Sharma    16
## 6  EJG Morgan       RG Sharma    15
## 7  EJG Morgan         V Kohli    14
## 8  EJG Morgan  Mohammed Shami    13
## 9  EJG Morgan    Yuvraj Singh    11
## 10 EJG Morgan         P Kumar    10
## 11 EJG Morgan       PP Chawla     7
## 12 EJG Morgan         P Awana     7
## 13 EJG Morgan       IK Pathan     1
## 14 EJG Morgan        MM Patel     1
## 15 EJG Morgan Harbhajan Singh     0
# All the top T20 Australian batsmen against India in all of Indian matches
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(ind_matches,"Australia",rank=0)
m
## Source: local data frame [40 x 2]
## 
##         batsman runsScored
##          (fctr)      (dbl)
## 1     SR Watson        284
## 2      AJ Finch        249
## 3     DA Warner        204
## 4       MS Wade        125
## 5     DJ Hussey        101
## 6     ML Hayden         79
## 7    RT Ponting         76
## 8     MJ Clarke         65
## 9     A Symonds         63
## 10 AC Gilchrist         59
## ..          ...        ...

14. Batsmen vs Bowlers in Twenty20 matches against all oppositions (continued)

# The best India T20 batsman(rank=0) against England and his performance against England bowlers
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(eng_matches,"India",rank=1,dispRows=30)
m
## Source: local data frame [13 x 3]
## Groups: batsman [1]
## 
##    batsman      bowler  runs
##     (fctr)      (fctr) (dbl)
## 1  V Kohli  TT Bresnan    28
## 2  V Kohli   LJ Wright    26
## 3  V Kohli   CR Woakes    25
## 4  V Kohli JC Tredwell    21
## 5  V Kohli     ST Finn    17
## 6  V Kohli      MM Ali    15
## 7  V Kohli   SCJ Broad    14
## 8  V Kohli JW Dernbach    10
## 9  V Kohli   RS Bopara     9
## 10 V Kohli   SC Meaker     7
## 11 V Kohli   HF Gurney     6
## 12 V Kohli    GP Swann     5
## 13 V Kohli   DR Briggs     1
# All the top Sri Lanka T20 batsmen (rank=0) against Australia and performances against Australian bowlers
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(aus_matches,"Sri Lanka",rank=0)
m
## Source: local data frame [24 x 2]
## 
##             batsman runsScored
##              (fctr)      (dbl)
## 1        TM Dilshan        247
## 2  DPMD Jayawardene        209
## 3     KC Sangakkara        177
## 4       NLTC Perera         80
## 5      LD Chandimal         55
## 6       BMAJ Mendis         55
## 7         J Mubarak         49
## 8        AD Mathews         48
## 9       MDKJ Perera         48
## 10       WPUJC Vaas         21
## ..              ...        ...
#All the top England T20 batsmen (rank=0) and their performances against South African bowlers
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(sa_matches,"England",rank=0)
m
## Source: local data frame [30 x 2]
## 
##           batsman runsScored
##            (fctr)      (dbl)
## 1      EJG Morgan        145
## 2    C Kieswetter        117
## 3    KP Pietersen        116
## 4  PD Collingwood         90
## 5       IJL Trott         84
## 6         OA Shah         74
## 7      JC Buttler         72
## 8        AD Hales         60
## 9        MJ Prior         42
## 10      RS Bopara         39
## ..            ...        ...

15. Batsmen vs Bowlers Plot in Twenty20 matches against all oppositions

The following functions plot the performances of the batsman based on the rank chosen against opposition bowlers. Note: The rank has to be >0

#The following plot displays the performance of the top India T20 batsman (rank=1) against all opposition bowlers. This is Virat Kohli for India

d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(ind_matches,"India",rank=1,dispRows=50)
d
## Source: local data frame [50 x 3]
## Groups: batsman [1]
## 
##    batsman        bowler  runs
##     (fctr)        (fctr) (dbl)
## 1  V Kohli Shahid Afridi    43
## 2  V Kohli     SR Watson    39
## 3  V Kohli   Imran Tahir    34
## 4  V Kohli      CJ Boyce    32
## 5  V Kohli   Saeed Ajmal    32
## 6  V Kohli        AJ Tye    31
## 7  V Kohli  HMRKB Herath    29
## 8  V Kohli    TT Bresnan    28
## 9  V Kohli KW Richardson    27
## 10 V Kohli     SM Boland    27
## ..     ...           ...   ...
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

batsmenVsBowler1-1

e <- teamBatsmenVsBowlersAllOppnAllMatchesPlot(d,plot=FALSE)
e
## Source: local data frame [50 x 3]
## Groups: batsman [1]
## 
##    batsman        bowler  runs
##     (fctr)        (fctr) (dbl)
## 1  V Kohli Shahid Afridi    43
## 2  V Kohli     SR Watson    39
## 3  V Kohli   Imran Tahir    34
## 4  V Kohli      CJ Boyce    32
## 5  V Kohli   Saeed Ajmal    32
## 6  V Kohli        AJ Tye    31
## 7  V Kohli  HMRKB Herath    29
## 8  V Kohli    TT Bresnan    28
## 9  V Kohli KW Richardson    27
## 10 V Kohli     SM Boland    27
## ..     ...           ...   ...
# The following plot displays the performance of the T20 batsman (rank=2) against all opposition bowlers. This is M S Dhoni for India
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(ind_matches,"India",rank=2,dispRows=50)
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

batsmenVsBowler1-2

# Best T20 batsman of South Africa against Indian  bowlers
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(ind_matches,"South Africa",rank=1,dispRows=30)
d
## Source: local data frame [21 x 3]
## Groups: batsman [1]
## 
##           batsman          bowler  runs
##            (fctr)          (fctr) (dbl)
## 1  AB de Villiers    Yuvraj Singh    27
## 2  AB de Villiers        R Ashwin    21
## 3  AB de Villiers       S Aravind    18
## 4  AB de Villiers        RP Singh    14
## 5  AB de Villiers Harbhajan Singh    13
## 6  AB de Villiers         B Kumar    13
## 7  AB de Villiers       MM Sharma    12
## 8  AB de Villiers       RA Jadeja    11
## 9  AB de Villiers       YK Pathan    11
## 10 AB de Villiers          Z Khan     9
## ..            ...             ...   ...
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

batsmenVsBowler1-3

# Best T20 batsman of England (rank=1) against Indian bowlers (matches=ind_matches)
d <-teamBatsmenVsBowlersAllOppnAllMatchesRept(matches=ind_matches,"England",rank=1,dispRows=50)
d
## Source: local data frame [15 x 3]
## Groups: batsman [1]
## 
##       batsman          bowler  runs
##        (fctr)          (fctr) (dbl)
## 1  EJG Morgan        R Ashwin    24
## 2  EJG Morgan        AB Dinda    22
## 3  EJG Morgan       KV Sharma    18
## 4  EJG Morgan       RA Jadeja    17
## 5  EJG Morgan       MM Sharma    16
## 6  EJG Morgan       RG Sharma    15
## 7  EJG Morgan         V Kohli    14
## 8  EJG Morgan  Mohammed Shami    13
## 9  EJG Morgan    Yuvraj Singh    11
## 10 EJG Morgan         P Kumar    10
## 11 EJG Morgan       PP Chawla     7
## 12 EJG Morgan         P Awana     7
## 13 EJG Morgan       IK Pathan     1
## 14 EJG Morgan        MM Patel     1
## 15 EJG Morgan Harbhajan Singh     0
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

batsmenVsBowler1-4

15. Batsmen vs Bowlers Plot in Twenty20 matches against all oppositions (continued)

# Top T20 batsman of South Africa and performance against opposition bowlers of all countries
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(sa_matches,"South Africa",rank=1,dispRows=50)
d
## Source: local data frame [50 x 3]
## Groups: batsman [1]
## 
##      batsman        bowler  runs
##       (fctr)        (fctr) (dbl)
## 1  JP Duminy   Saeed Ajmal    63
## 2  JP Duminy    BAW Mendis    63
## 3  JP Duminy      JR Hopes    58
## 4  JP Duminy     DJ Hussey    48
## 5  JP Duminy      KD Mills    46
## 6  JP Duminy    TG Southee    43
## 7  JP Duminy      CB Mpofu    42
## 8  JP Duminy Shahid Afridi    40
## 9  JP Duminy       SW Tait    38
## 10 JP Duminy   NL McCullum    32
## ..       ...           ...   ...
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

batsmenVsBowler2-1

# Do not display plot but return dataframe
e <- teamBatsmenVsBowlersAllOppnAllMatchesPlot(d,plot=FALSE)
e
## Source: local data frame [50 x 3]
## Groups: batsman [1]
## 
##      batsman        bowler  runs
##       (fctr)        (fctr) (dbl)
## 1  JP Duminy   Saeed Ajmal    63
## 2  JP Duminy    BAW Mendis    63
## 3  JP Duminy      JR Hopes    58
## 4  JP Duminy     DJ Hussey    48
## 5  JP Duminy      KD Mills    46
## 6  JP Duminy    TG Southee    43
## 7  JP Duminy      CB Mpofu    42
## 8  JP Duminy Shahid Afridi    40
## 9  JP Duminy       SW Tait    38
## 10 JP Duminy   NL McCullum    32
## ..       ...           ...   ...
# Top T20 batsman of Sri Lanka against bowlers of all countries
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(sl_matches,"Sri Lanka",rank=1,dispRows=50)
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

batsmenVsBowler2-2

# Best T20 West Indian against English bowlrs
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(eng_matches,"West Indies",rank=1,dispRows=50)
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

batsmenVsBowler2-3

16 Team bowling T20 scorecard against all opposition

The functions lists the top T20 bowlers of each country in matches. This function returns a dataframe when ‘matches’ is the matches of the country and ‘theTeam’ is the same country as in the functions below

teamBowlingScorecardAllOppnAllMatchesMain(matches=ind_matches,theTeam="India")
## Source: local data frame [41 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1         R Ashwin    18       0   900      41
## 2        IK Pathan    16       0   618      29
## 3  Harbhajan Singh    18       0   622      27
## 4     Yuvraj Singh    13       0   418      24
## 5        RA Jadeja    17       0   635      23
## 6          A Nehra    11       0   373      22
## 7         RP Singh     8       0   225      19
## 8           Z Khan    16       0   448      18
## 9         AB Dinda    11       0   245      17
## 10         B Kumar     9       0   294      14
## ..             ...   ...     ...   ...     ...
teamBowlingScorecardAllOppnAllMatchesMain(matches=aus_matches,theTeam="Australia")
## Source: local data frame [56 x 5]
## 
##        bowler overs maidens  runs wickets
##        (fctr) (int)   (int) (dbl)   (dbl)
## 1   SR Watson    21       0  1062      49
## 2  MG Johnson    19       0   797      42
## 3       B Lee    13       0   714      30
## 4     SW Tait    15       0   589      30
## 5   DP Nannes    14       0   403      29
## 6    MA Starc    18       0   508      28
## 7  PJ Cummins    19       0   395      22
## 8  NW Bracken    12       0   438      21
## 9   DJ Hussey    13       1   392      21
## 10  SPD Smith    10       0   377      18
## ..        ...   ...     ...   ...     ...
teamBowlingScorecardAllOppnAllMatchesMain(eng_matches,"England")
## Source: local data frame [47 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (dbl)
## 1       SCJ Broad    21       0  1491      68
## 2        GP Swann    16       0   859      53
## 3     JW Dernbach    17       0  1020      45
## 4         ST Finn    16       0   583      30
## 5      TT Bresnan    11       0   887      27
## 6   RJ Sidebottom    13       0   437      26
## 7     JM Anderson    16       0   552      20
## 8       LJ Wright    11       0   465      18
## 9       RS Bopara    12       0   387      17
## 10 PD Collingwood    10       0   329      16
## ..            ...   ...     ...   ...     ...
teamBowlingScorecardAllOppnAllMatchesMain(pak_matches,"Pakistan")
## Source: local data frame [37 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1    Shahid Afridi    16       0  2095      96
## 2      Saeed Ajmal    17       0  1516      94
## 3         Umar Gul    20       0  1400      91
## 4    Sohail Tanvir    18       0  1212      53
## 5  Mohammad Hafeez    19       0  1093      47
## 6    Mohammad Amir    11       0   557      27
## 7     Abdul Razzaq    13       0   367      22
## 8     Shoaib Malik    12       0   435      19
## 9    Shoaib Akhtar    10       0   421      19
## 10      Wahab Riaz    14       1   392      19
## ..             ...   ...     ...   ...     ...
teamBowlingScorecardAllOppnAllMatchesMain(sa_matches,"South Africa")
## Source: local data frame [40 x 5]
## 
##         bowler overs maidens  runs wickets
##         (fctr) (int)   (int) (dbl)   (dbl)
## 1     DW Steyn    17       0   879      59
## 2     M Morkel    18       0  1022      52
## 3   WD Parnell    21       0   891      41
## 4      J Botha    18       0   823      40
## 5    JA Morkel    19       0   835      30
## 6  Imran Tahir    13       0   426      27
## 7  RJ Peterson    18       1   451      26
## 8      D Wiese    16       0   267      22
## 9  LL Tsotsobe    14       0   541      20
## 10   R McLaren    16       0   332      19
## ..         ...   ...     ...   ...     ...
teamBowlingScorecardAllOppnAllMatchesMain(nz_matches,"New Zealand")
## Source: local data frame [48 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (dbl)
## 1     NL McCullum    18       0  1240      59
## 2      TG Southee    20       0  1182      52
## 3        KD Mills    16       0  1190      49
## 4      DL Vettori    12       0   748      39
## 5       IG Butler    15       0   481      27
## 6  MJ McClenaghan    19       0   642      25
## 7         SE Bond    12       0   518      25
## 8    JEC Franklin    16       0   417      25
## 9        JDP Oram    17       0   793      21
## 10      SB Styris    11       0   349      20
## ..            ...   ...     ...   ...     ...
teamBowlingScorecardAllOppnAllMatchesMain(sl_matches,"Sri Lanka")
## Source: local data frame [42 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1       SL Malinga    19       0  1522      86
## 2       BAW Mendis    17       0   911      61
## 3  KMDN Kulasekara    15       0  1052      52
## 4       AD Mathews    17       0   814      35
## 5      NLTC Perera    17       0   769      35
## 6  SMSM Senanayake    16       0   442      26
## 7    ST Jayasuriya    13       0   415      20
## 8     CRD Fernando    16       0   377      16
## 9     HMRKB Herath    12       2   174      15
## 10  M Muralitharan    13       0   297      14
## ..             ...   ...     ...   ...     ...
teamBowlingScorecardAllOppnAllMatchesMain(wi_matches,"West Indies")
## Source: local data frame [37 x 5]
## 
##        bowler overs maidens  runs wickets
##        (fctr) (int)   (int) (dbl)   (dbl)
## 1   DJG Sammy    20       0  1037      49
## 2    DJ Bravo    18       0  1127      46
## 3   SP Narine    15       0   692      40
## 4    S Badree    11       0   464      34
## 5   R Rampaul    16       0   705      29
## 6   JE Taylor    14       0   529      28
## 7  MN Samuels     6       0   561      24
## 8  KA Pollard    16       0   598      23
## 9  FH Edwards    14       0   497      19
## 10 K Santokie     9       0   278      19
## ..        ...   ...     ...   ...     ...

17 Team bowling T20 scorecard against all opposition (continued)

The function lists the top bowlers of a country (‘matches’) against the opposition country

# Best Indian bowlers in matches against Australia
teamBowlingScorecardAllOppnAllMatches(ind_matches,'Australia')
## Source: local data frame [26 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1         R Ashwin    13       1   232      10
## 2        RA Jadeja     7       0   219       9
## 3        JJ Bumrah     6       0   103       6
## 4    R Vinay Kumar     1       0    79       6
## 5     Yuvraj Singh     3       0    72       5
## 6         R Sharma     1       0    56       5
## 7          A Nehra     5       0   127       4
## 8        IK Pathan     8       0   115       4
## 9          B Kumar     4       0    42       4
## 10 Harbhajan Singh     8       0    83       3
## ..             ...   ...     ...   ...     ...
# Best Australian bowlers in matches against India
teamBowlingScorecardAllOppnAllMatches(aus_matches,'India')
## Source: local data frame [36 x 5]
## 
##        bowler overs maidens  runs wickets
##        (fctr) (int)   (int) (dbl)   (dbl)
## 1   SR Watson    13       0   201      11
## 2  MG Johnson     5       0    54       5
## 3       B Lee     6       0   133       4
## 4     SW Tait     5       0   112       3
## 5  NW Bracken     6       0    68       3
## 6   DP Nannes     1       0    25       3
## 7   DJ Hussey     4       0    24       3
## 8  PJ Cummins     4       0    16       3
## 9    CJ McKay     1       0    75       2
## 10    GB Hogg     5       0    69       2
## ..        ...   ...     ...   ...     ...
# Best New Zealand bowlers in matches against England
teamBowlingScorecardAllOppnAllMatches(nz_matches,'England')
## Source: local data frame [26 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (dbl)
## 1  MJ McClenaghan     9       0   189       8
## 2        KD Mills     6       0   199       7
## 3     NL McCullum    15       0   281       5
## 4      TG Southee     3       0   183       5
## 5       CS Martin     5       0   116       5
## 6      DL Vettori    11       0    91       5
## 7    JEC Franklin     7       0    53       5
## 8         SE Bond     8       0    49       5
## 9       IG Butler    11       0    95       4
## 10      SB Styris     8       0    80       3
## ..            ...   ...     ...   ...     ...
# Best Sri Lankan bowlers in matches against West Indies
teamBowlingScorecardAllOppnAllMatches(sl_matches,"West Indies")
## Source: local data frame [16 x 5]
## 
##              bowler overs maidens  runs wickets
##              (fctr) (int)   (int) (dbl)   (dbl)
## 1        BAW Mendis    10       0    82      13
## 2        SL Malinga    12       0   217      12
## 3        AD Mathews     9       0    87       6
## 4   TAM Siriwardana     5       0    58       5
## 5   SMSM Senanayake     7       0    90       4
## 6    M Muralitharan     9       0    76       4
## 7   KMDN Kulasekara    11       0   158       3
## 8      PVD Chameera     4       0    66       2
## 9           I Udana     7       0    56       1
## 10 DSNFG Jayasuriya     2       0    38       1
## 11      BMAJ Mendis     1       0    32       1
## 12      A Dananjaya     1       0    16       1
## 13       S Prasanna     2       0    15       1
## 14     HMRKB Herath     5       0    43       0
## 15    ST Jayasuriya     3       0    34       0
## 16      NLTC Perera     2       0    13       0

18. Team Bowlers versus Batsmen (in T20 against all oppositions)

The functions below give the peformance of bowlers versus batsman. They give the best bowlers and the total runs conceded and against whom were the runs conceded

# Best T20 bowlers overall from India against all opposition (rank=0)
teamBowlersVsBatsmenAllOppnAllMatchesMain(ind_matches,theTeam="India",rank=0)
## Source: local data frame [10 x 2]
## 
##             bowler  runs
##             (fctr) (dbl)
## 1         R Ashwin   868
## 2        RA Jadeja   619
## 3        IK Pathan   598
## 4  Harbhajan Singh   591
## 5           Z Khan   424
## 6     Yuvraj Singh   415
## 7        YK Pathan   406
## 8          A Nehra   368
## 9         I Sharma   349
## 10         B Kumar   275
# Top T20 bowler of India and runs conceded against different opposition batsmen 
(rank=1)
## [1] 1
m <-teamBowlersVsBatsmenAllOppnAllMatchesMain(ind_matches,theTeam="India",rank=1)
m
## Source: local data frame [95 x 3]
## Groups: bowler [1]
## 
##      bowler     batsman runsConceded
##      (fctr)      (fctr)        (dbl)
## 1  R Ashwin    AD Hales           43
## 2  R Ashwin    AJ Finch           42
## 3  R Ashwin   SR Watson           41
## 4  R Ashwin   DA Warner           37
## 5  R Ashwin     MS Wade           37
## 6  R Ashwin BB McCullum           26
## 7  R Ashwin   JP Duminy           26
## 8  R Ashwin  GJ Maxwell           24
## 9  R Ashwin  EJG Morgan           24
## 10 R Ashwin   CA Ingram           23
## ..      ...         ...          ...
# Top T20 bowler of India and runs conceded against different opposition batsmen (rank=2)
m <-teamBowlersVsBatsmenAllOppnAllMatchesMain(ind_matches,theTeam="India",rank=2)
m
## Source: local data frame [66 x 3]
## Groups: bowler [1]
## 
##       bowler       batsman runsConceded
##       (fctr)        (fctr)        (dbl)
## 1  RA Jadeja     SR Watson           59
## 2  RA Jadeja      AJ Finch           34
## 3  RA Jadeja       MS Wade           32
## 4  RA Jadeja CK Kapugedera           24
## 5  RA Jadeja   LMP Simmons           23
## 6  RA Jadeja      AD Hales           22
## 7  RA Jadeja     DA Warner           20
## 8  RA Jadeja     JH Kallis           19
## 9  RA Jadeja    EJG Morgan           17
## 10 RA Jadeja  LD Chandimal           17
## ..       ...           ...          ...

18. Team Bowlers versus Batsmen (in T20 matchesagainst all oppositions continued)

# Top T20 bowlers versus batsmen of South Africa(rank=0)
teamBowlersVsBatsmenAllOppnAllMatchesMain(sa_matches,theTeam="South Africa",rank=0)
## Source: local data frame [10 x 2]
## 
##         bowler  runs
##         (fctr) (dbl)
## 1     M Morkel   967
## 2   WD Parnell   858
## 3     DW Steyn   833
## 4    JA Morkel   807
## 5      J Botha   802
## 6  LL Tsotsobe   523
## 7  RJ Peterson   443
## 8  Imran Tahir   410
## 9    JP Duminy   406
## 10   KJ Abbott   353
# Top T20 bowlers versus batsmen of Pakistan(rank=0)
teamBowlersVsBatsmenAllOppnAllMatchesMain(pak_matches,theTeam="Pakistan",rank=0)
## Source: local data frame [10 x 2]
## 
##             bowler  runs
##             (fctr) (dbl)
## 1    Shahid Afridi  2054
## 2      Saeed Ajmal  1475
## 3         Umar Gul  1330
## 4    Sohail Tanvir  1147
## 5  Mohammad Hafeez  1060
## 6    Mohammad Amir   546
## 7     Shoaib Malik   407
## 8    Shoaib Akhtar   402
## 9       Wahab Riaz   369
## 10    Abdul Razzaq   364
# Top T20 bowlers versus batsmen of Sri Lanka(rank=0)
teamBowlersVsBatsmenAllOppnAllMatchesMain(sl_matches,theTeam="Sri Lanka",rank=1)
## Source: local data frame [168 x 3]
## Groups: bowler [1]
## 
##        bowler       batsman runsConceded
##        (fctr)        (fctr)        (dbl)
## 1  SL Malinga Shahid Afridi           66
## 2  SL Malinga    MN Samuels           55
## 3  SL Malinga   BB McCullum           38
## 4  SL Malinga    MJ Guptill           37
## 5  SL Malinga     G Gambhir           35
## 6  SL Malinga   NL McCullum           35
## 7  SL Malinga      JDP Oram           31
## 8  SL Malinga  Shoaib Malik           31
## 9  SL Malinga    MEK Hussey           30
## 10 SL Malinga  ADS Fletcher           30
## ..        ...           ...          ...
m <-teamBowlersVsBatsmenAllOppnAllMatchesMain(ind_matches,theTeam="India",rank=2)
m
## Source: local data frame [66 x 3]
## Groups: bowler [1]
## 
##       bowler       batsman runsConceded
##       (fctr)        (fctr)        (dbl)
## 1  RA Jadeja     SR Watson           59
## 2  RA Jadeja      AJ Finch           34
## 3  RA Jadeja       MS Wade           32
## 4  RA Jadeja CK Kapugedera           24
## 5  RA Jadeja   LMP Simmons           23
## 6  RA Jadeja      AD Hales           22
## 7  RA Jadeja     DA Warner           20
## 8  RA Jadeja     JH Kallis           19
## 9  RA Jadeja    EJG Morgan           17
## 10 RA Jadeja  LD Chandimal           17
## ..       ...           ...          ...

19. Team bowlers versus batsmen report (in T20 matches against all oppositions)

#Top T20 bowlers of other countries against India
teamBowlersVsBatsmenAllOppnAllMatchesRept(matches=ind_matches,theTeam="India",rank=0)
## Source: local data frame [10 x 2]
## 
##           bowler  runs
##           (fctr) (dbl)
## 1      SR Watson   190
## 2  Shahid Afridi   180
## 3       Umar Gul   171
## 4      SCJ Broad   151
## 5    JW Dernbach   149
## 6     SL Malinga   144
## 7     TT Bresnan   135
## 8    JP Faulkner   127
## 9          B Lee   123
## 10     JA Morkel   121
# Best T20 performer against India is Shane Watosn in T20s
a <- teamBowlersVsBatsmenAllOppnAllMatchesRept(ind_matches,theTeam="India",rank=1)
a
## Source: local data frame [12 x 3]
## Groups: bowler [1]
## 
##       bowler         batsman runsConceded
##       (fctr)          (fctr)        (dbl)
## 1  SR Watson       RG Sharma           41
## 2  SR Watson         V Kohli           39
## 3  SR Watson        SK Raina           26
## 4  SR Watson    Yuvraj Singh           23
## 5  SR Watson        MS Dhoni           21
## 6  SR Watson       IK Pathan           14
## 7  SR Watson        S Dhawan           10
## 8  SR Watson Harbhajan Singh            7
## 9  SR Watson       RA Jadeja            4
## 10 SR Watson        R Ashwin            4
## 11 SR Watson       AM Rahane            1
## 12 SR Watson         B Kumar            0

20. Team bowlers versus batsmen report (in T20s against all oppositions continued)

#Top T20 Indian bowlers against Sri Lanka 
teamBowlersVsBatsmenAllOppnAllMatchesRept(matches=ind_matches,theTeam="Sri Lanka",rank=0)
## Source: local data frame [10 x 2]
## 
##             bowler  runs
##             (fctr) (dbl)
## 1          A Nehra   140
## 2        YK Pathan   100
## 3        RA Jadeja    80
## 4         R Ashwin    74
## 5         I Sharma    60
## 6         SK Raina    58
## 7        IK Pathan    56
## 8         AB Dinda    53
## 9  Harbhajan Singh    35
## 10       JJ Bumrah    35
#Top T20 Indian bowlers against England
teamBowlersVsBatsmenAllOppnAllMatchesRept(ind_matches,"England",rank=0)
## Source: local data frame [10 x 2]
## 
##             bowler  runs
##             (fctr) (dbl)
## 1         R Ashwin   160
## 2        RA Jadeja    86
## 3         AB Dinda    86
## 4          P Awana    71
## 5        PP Chawla    68
## 6  Harbhajan Singh    61
## 7     Yuvraj Singh    58
## 8  Joginder Sharma    57
## 9    R Vinay Kumar    53
## 10       IK Pathan    52

21. Team T20 bowlers versus batsmen report (all oppositions coninued-1)

#Top  T20 opposition bowlers against New Zealand
teamBowlersVsBatsmenAllOppnAllMatchesRept(nz_matches,theTeam="New Zealand",rank=0)
## Source: local data frame [10 x 2]
## 
##             bowler  runs
##             (fctr) (dbl)
## 1    Shahid Afridi   333
## 2         M Morkel   283
## 3        SCJ Broad   279
## 4       SL Malinga   260
## 5         Umar Gul   240
## 6  KMDN Kulasekara   199
## 7    Mohammad Amir   192
## 8       BAW Mendis   190
## 9      Saeed Ajmal   170
## 10        P Utseya   159
# Top T20 opposition bowlers against Australia
teamBowlersVsBatsmenAllOppnAllMatchesRept(aus_matches,"Australia",rank=0)
## Source: local data frame [10 x 2]
## 
##             bowler  runs
##             (fctr) (dbl)
## 1      Saeed Ajmal   265
## 2       WD Parnell   254
## 3    Shahid Afridi   249
## 4        SCJ Broad   239
## 5         R Ashwin   222
## 6         Umar Gul   222
## 7        RA Jadeja   218
## 8          J Botha   210
## 9  Mohammad Hafeez   207
## 10     JW Dernbach   188
# Top T20 bowlers against Sri Lanka
teamBowlersVsBatsmenAllOppnAllMatchesRept(sl_matches,"Sri Lanka",rank=0)
## Source: local data frame [10 x 2]
## 
##             bowler  runs
##             (fctr) (dbl)
## 1    Shahid Afridi   291
## 2    Sohail Tanvir   273
## 3      Saeed Ajmal   223
## 4         KD Mills   204
## 5         Umar Gul   179
## 6         DJ Bravo   173
## 7  Mohammad Hafeez   170
## 8       DL Vettori   160
## 9         JDP Oram   159
## 10      KA Pollard   157

22. Team bowlers versus batsmen report (in T20s against all oppositions) plot

This function can only be used for rank > 0 (rank=1,2,3..)

# Top T20 bowler against India (Shane Watson of Australia)
df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(ind_matches,theTeam="India",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"India","India")

bowlerVsbatsmen1-1

# Top T20 Indian bowler versus England (R Ashwin)
df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(ind_matches,theTeam="England",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"India","England")

bowlerVsbatsmen1-2

#Top T20 Indian bowler against West Indies (Yusuf Pathan)
df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(ind_matches,theTeam="West Indies",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"India","West Indies")

bowlerVsbatsmen1-3

23. Team bowlers versus batsmen plot (in Twenty20 matches against all oppositions)

#Top T20 bowler against South Africa (NL McCullum of New Zealand)
df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(sa_matches,theTeam="South Africa",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"South Africa","South Africa")
## [1] "aa"

bowlerVsbatsmen2-1

# Top  T20 bowler versus Pakistan (SL Malinga)
df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(pak_matches,theTeam="Pakistan",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"Pakistan","Pakistan")

bowlerVsbatsmen2-2

24. Team Bowler Wicket Kind in Twenty20 matches against all oppositions

# Top opposition T20  bowlers against India and the kind of wickets
teamBowlingWicketKindAllOppnAllMatches(ind_matches,t1="India",t2="All")

bowlingWicketkind1-1

# Get the data frame. Do not plot
m <-teamBowlingWicketKindAllOppnAllMatches(ind_matches,t1="India",t2="All",plot=FALSE)
m
## Source: local data frame [21 x 3]
## Groups: bowler [?]
## 
##         bowler wicketKind     m
##         (fctr)      (chr) (int)
## 1   MG Johnson     caught     3
## 2   MG Johnson    run out     2
## 3    SR Watson     caught     8
## 4    SR Watson    run out     3
## 5   TT Bresnan     caught     6
## 6  JW Dernbach     bowled     1
## 7  JW Dernbach     caught     6
## 8  JW Dernbach    run out     3
## 9      ST Finn     bowled     2
## 10     ST Finn     caught     4
## ..         ...        ...   ...
# Best Indian T20 bowlers against South Africa
teamBowlingWicketKindAllOppnAllMatches(ind_matches,t1="India",t2="South Africa")

bowlingWicketkind1-2

# Best Indian bowlers against Pakistan
teamBowlingWicketKindAllOppnAllMatches(ind_matches,t1="India",t2="Pakistan")

bowlingWicketkind1-3

25. Team Bowler Wicket Kind in Twenty20 matches against all oppositions (continued)

# Best T20 opposition bowlers against  England
teamBowlingWicketKindAllOppnAllMatches(eng_matches,t1="England",t2="All")

bowlingWicketkind2-1

# Best t20  opposition bowlers  Australia
teamBowlingWicketKindAllOppnAllMatches(aus_matches,t1="Australia",t2="All")

bowlingWicketkind2-2

# Best T20 bowlers against  Sri Lanka
teamBowlingWicketKindAllOppnAllMatches(sl_matches,t1="Sri Lanka",t2="All")

bowlingWicketkind2-3

26. Team Bowler Wicket Runs in Twenty20 matches against all oppositions

# Opposition T20 bowlers against India and runs conceded
teamBowlingWicketRunsAllOppnAllMatches(ind_matches,t1="India",t2="All",plot=TRUE)

bowlingWicketRuns1-1

# Opposition T20 bowlers against India and runs conceded returned as dataframe
m <-teamBowlingWicketRunsAllOppnAllMatches(ind_matches,t1="India",t2="All",plot=FALSE)
m
## Source: local data frame [10 x 3]
## 
##           bowler runsConceded wickets
##           (fctr)        (dbl)   (dbl)
## 1      SR Watson          201      11
## 2       Umar Gul          178      11
## 3    JW Dernbach          157      10
## 4        ST Finn           83       7
## 5       CB Mpofu          119       7
## 6     TT Bresnan          140       6
## 7     DL Vettori           96       6
## 8     MG Johnson           54       5
## 9  Mohammad Asif           43       5
## 10 Shahid Afridi          184       5
# Top T20 Indian bowlers and runs conceded
teamBowlingWicketRunsAllOppnAllMatches(ind_matches,t1="India",t2="Australia",plot=TRUE)

bowlingWicketRuns1-2

27. Team Bowler Wicket Runs in Twenty20 matches against all oppositions(continued)

#Top opposition T20 bowlers against Pakistan
teamBowlingWicketRunsAllOppnAllMatches(pak_matches,t1="Pakistan",t2="All",plot=TRUE)

bowlingWicketRuns2-1

#Top opposition T20 bowlers against West Indies
teamBowlingWicketRunsAllOppnAllMatches(wi_matches,t1="West Indies",t2="All",plot=TRUE)

bowlingWicketRuns2-2

#Top opposition t20 bowlers against Sri Lanka
teamBowlingWicketRunsAllOppnAllMatches(sl_matches,t1="Sri Lanka",t2="All",plot=TRUE)

bowlingWicketRuns2-3

#Top opposition T20 bowlers against New Zealand
teamBowlingWicketRunsAllOppnAllMatches(nz_matches,t1="New Zealand",t2="All",plot=TRUE)

bowlingWicketRuns2-4

Conclusion

This post included all functions for a team in all Twenty20 matches against all oppositions. As before the data frames for the T20 matches are already available. You can load the data and begin to use them. If more insights from the dataframe are possible do go ahead. But please do attribute the source to Cricheet (http://cricsheet.org), my package yorkr and my blog. Do give the functions a spin for yourself.

The 4th part of the yorkr package’s handling of Twenty20 will follow soon.

Watch this space!

You may also like

  1. Introducing cricket package yorkr-Part1:Beaten by sheer pace!
  2. Introducing cricket package yorkr:Part 4-In the block hole!
  3. Literacy in India: A deepR dive
  4. Simulating an Edge shape in Android
  5. Re-working the Lucy Richardson algorithm in OpenCV
  6. Introducing cricketr! : An R package to analyze performances of cricketers
  7. Design principles of scalable distributed systems
  8. OpenCV: Fun with filters and convolution
  9. Getting started with memcached-libmemcached