Sixer – R package cricketr’s new Shiny avatar

In this post I create a Shiny App, Sixer, based on my R package cricketr. I had developed the R package cricketr, a few months back for analyzing the performances of batsman and bowlers in all formats of the game (Test, ODI and Twenty 20). This package uses the statistics info available in ESPN Cricinfo Statsguru. I had written a series of posts using the cricketr package where I chose a few batsmen, bowlers and compared their performances of these players. Here I have created a complete Shiny app with a lot more players and with almost all the features of the cricketr package. The motivation for creating the Shiny app was to

• To show case the  ‘cricketr’ package and to highlight its functionalities
• Perform analysis of more batsman and bowlers
• Allow users to interact with the package and to allow them to try out the different features and functions of the package and to also check performances of some of their favorite crickets

a) You can try out the interactive  Shiny app Sixer at – Sixer
b) The code for this Shiny app project can be cloned/forked from GitHub – Sixer

In this Shiny app I have 4 tabs which perform the following function
1.  Analyze Batsman
This tab analyzes batsmen based on different functions and plots the performances of the selected batsman. There are functions that compute and display batsman’s run-frequency ranges, Mean Strike rate, No of 4’s, dismissals, 3-D plot of Runs scored vs Balls Faced and Minutes at crease, Contribution to wins & losses, Home-Away record etc. The analyses can be done for Test cricketers, ODI and Twenty 20 batsman. I have included most of the Test batting giants including Tendulkar, Dravid, Sir Don Bradman, Viv Richards, Lara, Ponting etc. Similarly the ODI list includes Sehwag, Devilliers, Afridi, Maxwell etc. The Twenty20 list includes the Top 10 Twenty20 batsman based on their ICC rankings

2. Analyze bowler
This tab analyzes the bowling performances of bowlers, Wickets percentages, Mean Economy Rate, Wickets at different venues, Moving average of wickets etc. As earlier I have all the Top bowlers including Warne, Muralidharan, Kumble- the famed Indian spin quartet of Bedi, Chandrasekhar, Prasanna, Venkatraghavan, the deadly West Indies trio of Marshal, Roberts and Holding and the lethal combination of Imran Khan, Wasim Akram and Waqar Younis besides the dangerous Dennis Lillee and Jeff Thomson. Do give the functions a try and see for yourself the performances of these individual bowlers

3. Relative performances of batsman
This tab allows the selection of multiple batsmen (Test, ODI and Twenty 20) for comparisons. There are 2 main functions Relative Runs Frequency performance and Relative Mean Strike Rate

4. Relative performances of bowlers
Here we can compare bowling performances of multiple bowlers, which include functions Relative Bowling Performance and Relative Economy Rate. This can be done for Test, ODI and Twenty20 formats
Some of my earlier posts based on the R package cricketr include
1. Introducing cricketr!: An R package for analyzing performances of cricketers
2. Taking cricketr for a spin – Part 1
3. cricketr plays the ODIs
4. cricketr adapts to the Twenty20 International
5. cricketr digs the Ashes

Do try out the interactive Sixer Shiny app – Sixer
You can clone the code from Github – Sixer

There is not much in way of explanation. The Shiny app’s use is self-explanatory. You can choose a match type ( Test,ODI or Twenty20), choose a batsman/bowler  from the drop down list and select the plot you would like to seeHere a few sample plots
A. Analyze batsman tab
i) Batsman – Brian Lara , Match Type – Test, Function – Mean Strike Rate
ii) Batsman – Shahid Afridi, Match Type –  ODI, Function – Runs vs Balls faced
The plot below shows that if Afridi faces around 50 balls he is likely to score around 60 runs in ODIs.
iii)   Batsman – Chris Gayle, Match Type – Twenty20  Function – Moving Average
B. Analyze bowler tab

i. Bowler – B S Chandrasekhar, Match Type – Test, Function – Wickets vs Runs
ii)  Bowler – Malcolm Marshall, Match Type – Test, Function – Mean Economy Rateiii)  Bowler – Sunil Narine, Match Type – Twenty 20, Function – Bowler Wicket Rate

C. Relative performance of batsman (you can select more than 1)
The below plot gives the Mean Strike Rate of batsman. Viv Richards, Brian Lara, Sanath Jayasuriya and David Warner are best strikers of the ball.

Here are some of the great strikers of the ball in ODIs
D. Relative performance of bowlers (you can select more than 1)
Finally a look at the famed Indian spin quartet.  From the plot below it can be seen that  B S Bedi  & Venkatraghavan were more economical than Chandrasekhar and Prasanna.

But the latter have a better 4-5 wicket haul than the former two as seen in the plot below

Finally a look at the average number of balls to take a wicket by the Top 4 Twenty 20 bowlers.

Do give the Shiny app Sixer a try.

Literacy in India – A deepR dive

Published in R-bloggers: Literacy in India – A deepR dive
You can do magic!
You can have anything,
That you desire
Magic…
You can do magic – song by America (1982)

That is exactly how I feel when I write code in R. A few lines of R, lo behold, hundreds of rows and columns are magically transformed into  easily understandable graphs, regression curves or choropleth maps. (By the way, the song is a really cool! Listen to it if you have not heard it before). You really can do magic with R

In this post I do a deep dive into literacy in India The dataset is taken from Open Government Data (OGD) platform India was used for this purpose. This data is based on the 2001 census. Though the data is a little dated, it is extremely rich with literacy details across different age groups, and over all Indian States. The data includes the total number of persons/males/females who are in the primary, middle.matric, college,technical diploma, non-technical diploma and so on. In fact the data also includes the educational background of people in the districts in each state. I slice and dice the data across multiple parameters. I have created an interactive Shiny App which will provide very detailed visualization based on the parameters chosen

Do try out my interactive Shiny app : IndiaLiteracy

The entire code for this app is on GitHub. Feel free to download/clone/fork/modify or enhance the code – literacyInIndia

For analyzing   such a rich data set as the Census data of 2001, I create 4 tabs
1) State Literacy
2) Educational Levels vs Age
3) India Literacy and
4) District Literacy

Here are the details of these 4 tabs in my Shiny app

A) State Literacy
This tab provides the age wise distribution of people (Persons/Males/Females) who attend educational institutions. This is shown as a barplot. The plot also includes the national average. In the plot below which is for entire India we see that the national average

The distribution of females attending primary school in the state of Haryana is shown. Also included is the national average. As can be seen there are options for (Total/Urban/Rural) against (Persons/Males/Females) and whether these people attend educational institutions are illiterate of literate.

I also have another option under “Who’ which is “All” This will plot the age wise distribution of males/females/persons in urban/rural or entire state.

B. Educational Institutions vs Age plot

This plot displays the the educational institutions attended by people in a particular age group. So for example in the state of Orissa for the 18 year age group we can see that there persons who are in (Primary, Matric, Higher Secondary, Non-Technical Diploma and Technical Diploma). The bar length for each color is the percentage of the total persons at that level of education

C. Literacy across India
This tab plots a chorpleth map for a region(Urban+Rural, Urban, Rural), Who(Persons, Males, Females) and the literacy level (attending educational institutions, primary, higher secondary, Matric etc) across the whole of India.

D. Literacy within a state
This tab plots a chorpleth map of literacy in the districts of a state. A sample plot for Karnataka is shown below

E. Key observations

There is a wealth of insights you can glean by looking at the various charts. Here a few insights from my initial observations
1) The literacy in Kerala across ages is higher than the national average while in Bihar it is less than the national average

a) Kerala

b) Bihar

2) In Rajasthan The Males Attending education instituions is higher than the national average while for females it less than the national average. However the situation is reverse in Chandigarh where there are the percentage of females attending education instiuons is higher than the national average and the males

a) Rajasthan

b) Chandigarh

3) When we look at the number of persons attending educational institution across India the north-eastern states lead with Manipur, Nagaland and Sikkim in the top 3.

We have heard that Kerala is the most literate state. But  it looks like Manipur, Nagaland, Sikkim actually edge Kerala out. If we look at the State literacy chart for Kerala and Manipur this becomes more clear

a) Kerala

b) Manipur

It can be seen that in Manipur the number of persons attending educational instition in the age range 13-24 years it is much higher than the national average and much higher than Kerala

4) If we take a look at the District wise literacy for the state of Bihar we see that the literacy is lower in the north eastern districts.,

5) Here is another interesting observation I made. The top 3 states which are most ‘literate with no education’ are i) Rajasthan ii) Madhya Pradesh iii) Chhattisgarh

While I have included several charts with accompanying explanation, this is largely unnecessary as  most of the charts are self-explanatory.

Do try out the Shiny app and see for yourself the literacy in each state/district/age group educational  level etc – IndiaLiteracy

Feel free to clone/fork my code and make your own enhancements –literacyInIndia

Revisiting crimes against women in India

Here I go again, raking the muck about crimes against women in India. My earlier post “A crime map of India in R: Crimes against women in India” garnered a lot of responses from readers. In fact one of the readers even volunteered to create the only choropleth map in that post. The data for this post is taken from http://data.gov.in. You can download the data from the link “Crimes against women in India

I was so impressed by the choropleth map that I decided to do that for all crimes against women.(Wikipedia definition: A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map). Personally, I think pictures tell the story better. I am sure you will agree!

So here, I have it a Shiny app which will plot choropleth maps for a chosen crime in a given year.

You can try out my interactive Shiny app at  Crimes against women in India

In the picture below  are the details of  ‘Rape” in the year 2015.

Interestingly the ‘Total Crime against women’ in 2001 shows the Top 5 as

But in 2015 West Bengal tops the list, as the real heavy weight in crimes against women. The new pecking order in 2015 for ‘Total Crimes against Women’ is

1) West Bengal 2) Andhra Pradesh 3) Uttar Pradesh  4) Rajasthan 5) Maharashtra

Similarly for rapes, West Bengal is nowhere in the top 5 list in 2001. In 2015, it is in second only to the national rape leader Madhya Pradesh.  Also in 2001 West Bengal is not in the top 5 for any of 6 crime heads. But in 2015, West Bengal is in the top 5 of 6 crime heads. The emergence of West Bengal as the leader in Crimes against Women is due to the steep increase in crime rate  over the years.Clearly the law and order situation in West Bengal is heading south.

In Dowry Deaths, UP, Bihar, MP, West Bengal lead the pack, and in that order in 2015.

The usual suspects for most crime categories are West Bengal, UP, MP, AP & Maharashtra.

The state-wise crime charts plot the incidence of the crime (rape, dowry death, assault on women etc) over the years. Data for each state and for each crime was available from 2001-2013. The data for period 2014-2018 are projected using linear regression. The shaded portion in the plots indicate the 95% confidence level in the prediction (i.e in other words we can be 95% certain that the true mean of the crime rate in the projected years will lie within the shaded region)

Several  interesting requests came from readers to my earlier post. Some of them were to to plot the crimes as function of population and per capita income of the State/Union Territory to see if the plots  throw up new crime leaders. I have not got the relevant state-wise population distribution data yet. I intend to update this when I get my hands on this data.

I have included the crimes.csv which has been used to generate the visualization. However for the Shiny app I save this as .RData for better performance of the app.

You can clone/download  the code for the Shiny app from GitHub at  crimesAgainWomenIndia

Please checkout my Shiny app : Crimes against women

I also intend to add further interactivity to my visualizations in a future version. Watch this space. I’ll be back!

Natural language processing: What would Shakespeare say?

Here is a scene from  Christopher Nolan’s classic movie Interstellar. In this scene  Cooper, a crew member of the Endurance spaceship which is on its way to 3 distant planets via a wormhole, is conversing with TARS which is one of  US Marine Corps former robots some year in the future.

TARS (flippantly): “Everybody good? Plenty of slaves for my robot colony?”
TARS: [as Cooper repairs him] Settings. General settings. Security settings.
TARS: Honesty, new setting: ninety-five percent.
TARS: Confirmed. Additional settings.
Cooper: Humor, seventy-five percent.
TARS: Confirmed. Self-destruct sequence in T minus 10, 9…
Cooper: Let’s make that sixty percent.
TARS: Sixty percent, confirmed. Knock knock.
Cooper: You want fifty-five?

Natural Language has been an area of serious research for several decades ever since Alan Turing in 1950 proposed a test in which a human evaluator would simultaneously judge natural language conversations between another human and a machine, that is designed to generate human-like responses, behind a closed doors. If the responses of the human and machine were indistinguishable then we can say that the machine has passed the Turing test signifying machine intelligence.

How cool would it be if we could  converse with a machines using Natural Language  with all the subtleties of language including irony, sarcasm and humor? While considerable progress has been made in  Natural Language Processing for e.g. Watson, Siri and Cortana  the ability to handle nuances like humor, sarcasm is probably many years away.

This post looks at one aspect of Natural Language Processing, particularly in dealing with the ability to predict the next word(s) given a word or phrase.

This title of this post should really be ‘Natural language Processing: What would Shakespeare say, and what would you say’ because this post includes two interactive apps that can predict the next word

a) The first app given a (Shakespearean) phrase will predict the most likely word that Shakespeare would have said
Try the Shiny app : What would Shakespeare have said?

b) The second app will, given a regular phrase  predict the next word(s)  in regular day to day English usage
Try the Shiny app: What would you say?

Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. NLP encompasses many areas from computer science  besides inputs from the domain of  linguistics , psychology, information theory, mathematics and statistics

However NLP is a difficult domain as each language has its own quirkiness and ambiguities,  and English is no different. Let us take the following 2 sentences

Time flies like an arrow.
Fruit flies like a banana.

Clearly the 2 sentences mean  entirely different things when referencing  the words ‘flies like’. The English language is filled with many such ambiguous constructions

There have been 2 main approaches to Natural Language Processing – The rationalist approach and the empiricist’s approach. The empiricists  approached natural language as a data driven problem based on statistics while the rationalist school led by Noam Chomsky, the linguist,  strongly believed that sentence structure should be analyzed at a deeper level than mere surface statistics.

In his book Syntactic Structures, Chomsky introduces a famous example of his criticism of finite-state probabilistic models. He cites 2 sentences  (a) ‘colorless green ideas sleep furiously’  (b) ‘furiously sleep ideas green colorless’.  Chomsky’s contention is that while neither sentence or  any of its parts, have ever occurred in the past linguistic experience of  English it can be easily inferred that   (a) is grammatical, while (b) is not. Chomsky argument is that sentence structure is critical to Natural Language processing of any kind. Here is a good post by Peter Norvig ‘On Chomsky and the two cultures of statistical learning’. In fact,  from 1950 to the 1980s the empiricists approach fell out of favor while reasonable progress was made based on rationalist approach to NLP.

The return of the empiricists
But thanks to great strides in processing power and the significant drop in hardware the empiricists approach to Natural Language Processing  made a comeback in the mid 1980s.  The use of probabilistic language models combined with the increase in the  power of processing saw the rise of the empiricists again. Also there had been significant improvement in machine learning algorithms which allowed the use of the computing resources more efficiently.

In this post I showcase 2 Shiny apps written in R that predict the next word given a phrase using  statistical approaches, belonging to the empiricist school of thought. The 1st one will try to predict what Shakespeare would have said  given a phrase (Shakespearean or otherwise)  and the 2nd is a regular app that will predict what we would say in our regular day to day conversation. These apps will predict the next word as you keep typing in each word.

In NLP the first step is a to build a language model. In order to  build a language model the program ingests a large corpora of documents.  For the a) Shakespearean app, the corpus is the “Complete Works of Shakespeare“.  This is also available in Free ebooks by Project Gutenberg but you will have to do some cleaning and tokenzing before using it. For the b) regular English next word predicting app the corpus is composed of several hundred MBs of tweets, news items and blogs.

Once the corpus is ingested the software then creates a n-gram model. A 1-gram model is representation of all unique single words and their counts. Similarly a bigram model is representation of all 2 words and their counts found in the corpus. Similar we can have trigram, quadgram and n-gram as required. Typically language models don’t go beyond 5-gram as the processing power needed increases for these larger n-gram models.

The probability of a sentence can be determined  using the chain rule. This is shown for the bigram model  below where P(s) is the probability of a sentence ‘s’
P( The quick brown fox jumped) =
P(The) P(quick|The) P(brown|The quick) * P(fox||The quick brown) *P(jumped|The quick brown fox)
where BOS -> is the beginning of the sentence and

P(quick|The) – The probability of the word being ‘quick’ given that the previous word was ‘The’. This probability can be approximated based on Markov’s chain rule which allows that the we can compute the conditional probability
$P(w|w_{i})$

of a word based on a couple of its preceding words. Hence this allows this approximation as follows
$P(w{_{i}}|w_{1}w_{2}w_{3}..w_{i-1}) = P(w{_{i}}|w_{i-1})$

The Maximum Likelihood Estimate (MLE) is given as follows for a bigram
$P_{MLE}(w_{i}|w_{i-1}) = count(w_{i-1},w_{i})/count(w_{i-1})$
$P_{MLE}(w_{i}|w_{i-1}) = c(w_{i-1},w_{i})/c(w_{i-1})$

Hence for a corpus
We can calculate the maximum likelihood estimates of a given word from its previous word. This computation of the MLE can be extended to the trigram and the quadgram

For a trigram
$P(w_{i}|w_{i-1}w_{i-2}) = c(w_{i-2}w_{i-1},w_{i})/c(w_{i-2}w_{i-1})$

Smoothing techniques
The MLE estimates for many bigrams and trigrams will be 0, because we may have not have yet seen certain combinations. But the fact that we have not seen these combinations in the corpus should not  mean that they could never occur, So the MLE for the bigrams, trigrams etc have be smoothed so that it does not have a 0 conditional probability. One such method is to use ‘Laplace smoothing’. This smoothing tries to steal from the probability mass of words that occur in the corpus and re-distribute it to the words that do not occur in the corpus. In a way this equivalent to probability mass stealing. This is the simplest smoothing technique and is also known as the ‘add +1’ smoothing technique and requires that 1 be added to all counts

So the  MLE below
$P_{MLE}(w_{i}|w_{i-1}) = c(w_{i-1},c_{i})/c(w_{i-1})$

With the add +1 smoothing this becomes
$P_{MLE}(w_{i}|w_{i-1}) = c(w_{i-1},c_{i})+1/c(w_{i-1})+V$

This smoothing is done for bigram, trigam and quadgram.  Smoothing is usually used with an associated technique called ‘backoff’. If the phrase is not found in a n-gram model then we need to backoff to a n-1 gram model. For e.g. a lookup will be done in quadgrams, if not found the algorithm will backoff to trigram,  bigram and finally to unigram.

Hence if we had the phrase
“on my way”

The smoothed MLE for a quadgram will be checked for the next word. If this is not found this is backed of my searching smoothed MLEs for trigrams for the phrase ‘my way’ and if this not found search the bigram for the next word to ‘way’.

One such method is the Katz backoff which is given by which is based on the following method
Bigrams with nonzero count are discounted according to discount ratio d_{r} (i.e. the unigram model).
$r^{*}=(r+1)n_{r+1}/n_{_{r}}$
$d_{r} = r^{*}/r$

Count mass subtracted from nonzero counts is redistributed among the zero-count bigrams according to next lower-order distribution

A better performance is obtained with the Kneser-Ney algorithm which computes the continuation probability of words. The Kneser-Ney algorithm is included below
$P_{\mathit{KN}}(w_i \mid w_{i-1}) = \dfrac{\max(c(w_{i-1} w_i) - \delta, 0)}{\sum_{w'} c(w_{i-1} w')} + \lambda \dfrac{\left| \{ w_{i-1} : c(w_{i-1}, w_i) > 0 \} \right|}{\left| \{ w_{j-1} : c(w_{j-1},w_j) > 0\} \right|}$

where
$\lambda(w_{i-1}) = \dfrac{\delta}{c(w_{i-1})} \left| \{w' : c(w_{i-1}, w') > 0\} \right|$

This post was inspired by the final Capstone Project in which I had to create a Shiny app for predicting the next word as a part of  Data Science Specialization conducted by John Hopkins University, Bloomberg School of Public health at Coursera.

I further extended this concept  where I try to predict what Shakespeare would have said.  For this I ingest the Complete Works of Shakespeare which is the corpus. The +1 Add smoothing with Katz backoff and the Kneser-Ney algorithm on the unigram, bigram, trigram and quadgrams were then implemented.

Note: This post  in no way tries to belittle the genius of Shakespeare.  From the table below it can be seen that our day to day conversation has approximately 210K, 181K & 65K unique bigrams, trigrams and quadgrams. On the other hand, Shakespearean literature has 271K, 505K, & 517K bigrams, trigrams and quadgrams. It can be seen that Shakespeare had a rich and complex set of word combination.

Not surprisingly the computation of the conditional and continuation probabilities for the Shakespearean literature is orders of magnitude larger.
Here is a small table as comparison

This implementation was done entirely using R. The main R packages used for this implementation were tm,Rweka,dplyr. Here is a slide deck on the the implementation details of the apps and key  lessons learnt: PredictNextWord
Unfortunately I will not be able to include the implementation details as I am bound by The Coursera Honor Code.

If you have not already given the apps a try do give them a try
Try the Shiny apps
What would Shakespeare say?
What would you say?

You may like
1. Introducing cricketr! : An R package to analyze performances of cricketers
2. cricketr digs the Ashes!
3. A peek into literacy in India: Statistical Learning with R
4. A crime map of India in R – Crimes against women
5. Analyzing cricket’s batting legends – Through the mirage with R
6. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid

Introduction

This should be last in the series of posts based on my R package cricketr. That is, unless some bright idea comes trotting along and light bulbs go on around my head.

In this post cricketr adapts to the Twenty20 International format. Now cricketr can handle stats from all 3 formats of the game namely Test matches, ODIs and Twenty20 International from ESPN Cricinfo. You should be able to install the package from GitHub and use the many of the functions available in the package.

You can also read this post at Rpubs as twenty20-cricketr. Download this report as a PDF file from twenty20-cricketr.pdf

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

I have chosen the Top 4 batsmen and top 4 bowlers based on ICC rankings and/or number of matches played.

Batsmen

1. Virat Kohli (Ind)
2. Faf du Plessis (SA)
3. A J Finch (Aus)
4. Brendon McCullum (Aus)

Bowlers

1. Samuel Badree (WI)
2. Sunil Narine (WI)
3. Ravichander Ashwin (Ind)
4. Ajantha Mendis (SL)

I have explained the plots and added my own observations. Please feel free to draw your conclusions!

The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virat Kohli, Sunil Narine etc. This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html. Hence, Sachin’s profile is 253802. This can be used to get the data for Virat Kohli as shown below

library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)

The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virat Kohli, Sunil Narine etc. This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html. Hence, Kohlis profile is 253802. This can be used to get the data for Virat Kohli as shown below

kohli <- getPlayerDataTT(253802,dir="..",file="kohli.csv",type="batting")

The analysis is included below

Analyses of Batsmen

The following plots gives the analysis of the 4 ODI batsmen

1. Virat Kohli (Ind) – Innings-26, Runs-972, Average-46.28,Strike Rate-131.70
2. Faf du Plessis (SA) – Innings-24, Runs-805, Average-42.36,Strike Rate-135.75
3. A J Finch (Aus) – Innings-22, Runs-756, Average-39.78,Strike Rate-152.41
4. Brendon McCullum (NZ) – Innings-70, Runs-2140, Average-35.66,Strike Rate-136.21

Plot of 4s, 6s and the scoring rate in ODIs

The 3 charts below give the number of

1. 4s vs Runs scored
2. 6s vs Runs scored
3. Balls faced vs Runs scored A regression line is fitted in each of these plots for each of the ODI batsmen

A. Virat Kohli
– The 1st plot shows that Kohli approximately hits about 5 4’s on his way to the 50s
– The 2nd box plot of no of 6s and runs shows the range of runs when Kohli scored 1,2 or 4 6s. The dark line in the box shows the average runs when he scored those number of 6s. So when he scored 1 6 the average runs he scored was 45
– The 3rd plot shows the number of runs scored against the balls faced. It can be seen when Kohli faced 50 balls he had scored around ~ 70 runs

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./kohli.csv","Kohli")
batsman6s("./kohli.csv","Kohli")
batsmanScoringRateODTT("./kohli.csv","Kohli")

dev.off()
## null device
##           1

B. Faf du Plessis

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./plessis.csv","Du Plessis")
batsman6s("./plessis.csv","Du Plessis")
batsmanScoringRateODTT("./plessis.csv","Du Plessss")

dev.off()
## null device
##           1

C. A J Finch

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./finch.csv","A J Finch")
batsman6s("./finch.csv","A J Finch")
batsmanScoringRateODTT("./finch.csv","A J Finch")

dev.off()
## null device
##           1

D. Brendon McCullum

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./mccullum.csv","McCullum")
batsman6s("./mccullum.csv","McCullum")
batsmanScoringRateODTT("./mccullum.csv","McCullum")

dev.off()
## null device
##           1

Relative Mean Strike Rate

This plot shows the Mean Strike Rate of the batsman in each run range. It can be seen the A J Finch has the best strike rate followed by B McCullum.

par(mar=c(4,4,2,2))
frames <- list("./kohli.csv","./plessis.csv","finch.csv","mccullum.csv")
names <- list("Kohli","Du Plessis","Finch","McCullum")
relativeBatsmanSRODTT(frames,names)

Relative Runs Frequency Percentage

The plot below provides the average runs scored in each run range 0-5,5-10,10-15 etc. Clearly Kohli has the most runs scored in most of the runs ranges. . This is also evident in the fact that Kohli has the highest average. He is followed by McCullum

frames <- list("./kohli.csv","./plessis.csv","finch.csv","mccullum.csv")
names <- list("Kohli","Du Plessis","Finch","McCullum")
relativeRunsFreqPerfODTT(frames,names)

Percent 4’s,6’s in total runs scored

The plot below shows the percentage of runs scored by way of 4s and 6s for each batsman. Du Plessis has the highest percentage of 4s, McCullum has the highest 6s. Finch has the highest percentage of 4s & 6s – 25.37 + 15.64= 41.01%

rames <- list("./kohli.csv","./plessis.csv","finch.csv","mccullum.csv")
names <- list("Kohli","Du Plessis","Finch","McCullum")
runs4s6s <-batsman4s6s(frames,names)

print(runs4s6s)
##                Kohli Du Plessis Finch McCullum
## Runs(1s,2s,3s) 64.29      64.55 58.99    61.45
## 4s             27.78      24.38 25.37    22.87
## 6s              7.94      11.07 15.64    15.69

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is then fitted based on the Balls Faced and Minutes at Crease to give the runs scored

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./kohli.csv","Kohli")
battingPerf3d("./plessis.csv","Du Plessis")

dev.off()
## null device
##           1
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./finch.csv","A J Finch")
battingPerf3d("./mccullum.csv","McCullum")

dev.off()
## null device
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A hypothetical Balls faced and Minutes at Crease is used to predict the runs scored by each batsman based on the computed prediction plane

BF <- seq( 5, 70,length=10)
Mins <- seq(5,70,length=10)
newDF <- data.frame(BF,Mins)

kohli <- batsmanRunsPredict("./kohli.csv","Kohli",newdataframe=newDF)
plessis <- batsmanRunsPredict("./plessis.csv","Du Plessis",newdataframe=newDF)
finch <- batsmanRunsPredict("./finch.csv","A J Finch",newdataframe=newDF)
mccullum <- batsmanRunsPredict("./mccullum.csv","McCullum",newdataframe=newDF)

The predicted runs is displayed. As can be seen Finch has the best overall strike rate followed by McCullum.

batsmen <-cbind(round(kohli$Runs),round(plessis$Runs),round(finch$Runs),round(mccullum$Runs))
colnames(batsmen) <- c("Kohli","Du Plessis","Finch","McCullum")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Kohli Du Plessis Finch McCullum
## 1           5            5     2          1     5        3
## 2          12           12    12         10    22       16
## 3          19           19    22         19    40       28
## 4          27           27    31         28    57       41
## 5          34           34    41         37    74       54
## 6          41           41    51         47    91       66
## 7          48           48    60         56   108       79
## 8          56           56    70         65   125       91
## 9          63           63    79         74   142      104
## 10         70           70    89         84   159      117

Highest runs likelihood

The plots below the runs likelihood of batsman. This uses K-Means Kohli has the highest likelihood of scoring runs 34.2% likely to score 66 runs. Du Plessis has 25% likelihood to score 53 runs, A. Virat Kohli

batsmanRunsLikelihood("./kohli.csv","Kohli")

## Summary of  Kohli 's runs scoring likelihood
## **************************************************
##
## There is a 23.08 % likelihood that Kohli  will make  10 Runs in  10 balls over 13  Minutes
## There is a 42.31 % likelihood that Kohli  will make  29 Runs in  23 balls over  30  Minutes
## There is a 34.62 % likelihood that Kohli  will make  66 Runs in  47 balls over 63  Minutes

B. Faf Du Plessis

batsmanRunsLikelihood("./plessis.csv","Du Plessis")

## Summary of  Du Plessis 's runs scoring likelihood
## **************************************************
##
## There is a 62.5 % likelihood that Du Plessis  will make  14 Runs in  11 balls over 19  Minutes
## There is a 25 % likelihood that Du Plessis  will make  53 Runs in  40 balls over  50  Minutes
## There is a 12.5 % likelihood that Du Plessis  will make  94 Runs in  61 balls over 90  Minutes

C. A J Finch

batsmanRunsLikelihood("./finch.csv","A J Finch")

## Summary of  A J Finch 's runs scoring likelihood
## **************************************************
##
## There is a 20 % likelihood that A J Finch  will make  95 Runs in  54 balls over 70  Minutes
## There is a 25 % likelihood that A J Finch  will make  42 Runs in  27 balls over  35  Minutes
## There is a 55 % likelihood that A J Finch  will make  8 Runs in  8 balls over 12  Minutes

D. Brendon McCullum

batsmanRunsLikelihood("./mccullum.csv","McCullum")

## Summary of  McCullum 's runs scoring likelihood
## **************************************************
##
## There is a 50.72 % likelihood that McCullum  will make  11 Runs in  10 balls over 13  Minutes
## There is a 28.99 % likelihood that McCullum  will make  36 Runs in  27 balls over  37  Minutes
## There is a 20.29 % likelihood that McCullum  will make  74 Runs in  48 balls over 70  Minutes

Moving Average of runs over career

The moving average for the 4 batsmen indicate the following. It must be noted that there is not sufficient data yet on Twenty20 Internationals. Kpohli, Du Plessis and Finch average only 26 innings while McCullum has close to 70. So the moving average while an indication will regress towards the mean over time.

1. The moving average of Kohli and Du Plessis is on the way up.
2. McCullum has a consistent performance while Finch had a brief burst in 2013-2014
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./kohli.csv","Kohli")
batsmanMovingAverage("./plessis.csv","Du Plessis")
batsmanMovingAverage("./finch.csv","A J Finch")
batsmanMovingAverage("./mccullum.csv","McCullum")

dev.off()
## null device
##           1

Analysis of bowlers

1. Samuel Badree (WI) – Innings-22, Runs -464, Wickets – 31, Econ Rate : 5.39
2. Sunil Narine (WI)- Innings-31,Runs-666, Wickets – 38 , Econ Rate : 5.70
3. Ravichander Ashwin (Ind)- Innings-26, Runs- 732, Wickets – 25, Econ Rate : 7.32
4. Ajantha Mendis (SL)- Innings-39, Runs – 952,Wickets – 66, Econ Rate : 6.45

The plot shows the frequency with which the bowlers have taken 1,2,3 etc wickets. The most wickets taken is by Ajantha Mendis (6 wickets)

Wicket Frequency percentage

This plot gives the percentage of wickets for each wickets (1,2,3…etc)

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./mendis.csv","Mendis")
bowlerWktsFreqPercent("./narine.csv","Narine")
bowlerWktsFreqPercent("./ashwin.csv","Ashwin")

dev.off()
## null device
##           1

Wickets Runs plot

The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers. The ends of the box indicate the 25% and 75% percentile of runs scored for the wickets taken and the dark balck line is the average runs conceded.

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsRunsPlot("./mendis.csv","Mendis")
bowlerWktsRunsPlot("./narine.csv","Narine")
bowlerWktsRunsPlot("./ashwin.csv","Ashwin")

dev.off()
## null device
##           1

This plot below shows the average number of deliveries needed by the bowler to take the wickets (1,2,3 etc)

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktRateTT("./mendis.csv","Mendis")

dev.off()
## null device
##           1
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktRateTT("./narine.csv","Narine")
bowlerWktRateTT("./ashwin.csv","Ashwin")

dev.off()
## null device
##           1

Relative bowling performance

The plot below shows that Narine has the most wickets in the 2 -4 range followed by Mendis

frames <- list("./badree.csv","./mendis.csv","narine.csv","ashwin.csv")
relativeBowlingPerf(frames,names)

Relative Economy Rate against wickets taken

The economy rate can be deduced as follows from the plot below. Narine has a good economy rate around 1 & 4 wickets, Ashwin around 2 wickets and Badree around 3. wickets

frames <- list("./badree.csv","./mendis.csv","narine.csv","ashwin.csv")
relativeBowlingERODTT(frames,names)

Relative Wicket Rate

The relative wicket rate plots the mean number of deliveries needed to take the wickets namely (1,2,3,4). For e.g. Narine needed an average of 22 deliveries to take 1 wicket and 22.5,23.2, 24 deliveries to take 2,3 & 4 wickets respectively

frames <- list("./badree.csv","./mendis.csv","narine.csv","ashwin.csv")
relativeWktRateTT(frames,names)

Moving average of wickets over career

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./mendis.csv","Mendis")
bowlerMovingAverage("./narine.csv","Narine")
bowlerMovingAverage("./ashwin.csv","Ashwin")
## null device
##           1

Key findings

Here are some key conclusions

Twenty 20 batsmen

1. Kohli has the a very consistent performance scoring high runs in the different run ranges. Kohli also has a 34.2% likelihood to score 6 runs. He is followed by McCullum for consisten performance
2. Finch has a best strike rate followed by McCullum.
3. Du Plessis has the highest percentage of 4s and McCullum has the percentage of 6s. Finch is superior in the percentage of runs scored in 4s and 6s
4. For a hypothetical balls faced and minutes at crease, Finch does best followed by McCullum
5. Kohli’s & Du Plessis Twenty20 career is on a upswing. Can they maintain the momentum. McCullum is consistent

Twenty20 bowlers

1. Narine has the highest wickets percentage for different wickets taken followed by Mendis
2. Mendis has taken 1,2,3,4,6 wickets in 24 deliveries
3. Narine has the lowest economy rate for 1 & 4 wickets, Ashwin for 2 wickets and Badree for 3 wickets. Mendis is comparatively expensive
4. Narine needed the least deliveries to get 1 (22.5) & 2 (23.2) wickets, Mendis needed 20.5 deliveries and Ashwin 19 deliveries for 4 wickets

Key takeaways 1. If all the above batsment and bowlers were in the same team we expect

1. Finch would be most useful when the run rate has to be greatly accelerated followed by McCullum
2. If the need is to consolidate, then Kohli is the best man for the job followed by McCullum
3. Overall McCullum is the best bet for Twenty20
4. When it comes to bowling Narine wins hands down as he has the most wickets, a good economy rate and a very good attack rate. So Narine is great bet for providing a vital breakthrough.

Also see my other posts in R

You may also like

cricketr plays the ODIs!

Published in R bloggers: cricketr plays the ODIs

Introduction

In this post my package ‘cricketr’ takes a swing at One Day Internationals(ODIs). Like test batsman who adapt to ODIs with some innovative strokes, the cricketr package has some additional functions and some modified functions to handle the high strike and economy rates in ODIs. As before I have chosen my top 4 ODI batsmen and top 4 ODI bowlers.

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

You can also read this post at Rpubs as odi-cricketr. Dowload this report as a PDF file from odi-cricketr.pdf

Batsmen

1. Virendar Sehwag (Ind)
2. AB Devilliers (SA)
3. Chris Gayle (WI)
4. Glenn Maxwell (Aus)

Bowlers

1. Mitchell Johnson (Aus)
2. Lasith Malinga (SL)
3. Dale Steyn (SA)
4. Tim Southee (NZ)

I have sprinkled the plots with a few of my comments. Feel free to draw your conclusions! The analysis is included below

The profile for Virender Sehwag is 35263. This can be used to get the ODI data for Sehwag. For a batsman the type should be “batting” and for a bowler the type should be “bowling” and the function is getPlayerDataOD()

library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)


The One day data for a particular player can be obtained with the getPlayerDataOD() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virendar Sehwag, etc. This will bring up a page which have the profile number for the player e.g. for Virendar Sehwag this would be http://www.espncricinfo.com/india/content/player/35263.html. Hence, Sehwag’s profile is 35263. This can be used to get the data for Virat Sehwag as shown below

sehwag <- getPlayerDataOD(35263,dir="..",file="sehwag.csv",type="batting")

Analyses of Batsmen

The following plots gives the analysis of the 4 ODI batsmen

1. Virendar Sehwag (Ind) – Innings – 245, Runs = 8586, Average=35.05, Strike Rate= 104.33
2. AB Devilliers (SA) – Innings – 179, Runs= 7941, Average=53.65, Strike Rate= 99.12
3. Chris Gayle (WI) – Innings – 264, Runs= 9221, Average=37.65, Strike Rate= 85.11
4. Glenn Maxwell (Aus) – Innings – 45, Runs= 1367, Average=35.02, Strike Rate= 126.69

Plot of 4s, 6s and the scoring rate in ODIs

The 3 charts below give the number of

1. 4s vs Runs scored
2. 6s vs Runs scored
3. Balls faced vs Runs scored

A regression line is fitted in each of these plots for each of the ODI batsmen A. Virender Sehwag

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./sehwag.csv","Sehwag")
batsman6s("./sehwag.csv","Sehwag")
batsmanScoringRateODTT("./sehwag.csv","Sehwag")

dev.off()
## null device
##           1

B. AB Devilliers

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./devilliers.csv","Devillier")
batsman6s("./devilliers.csv","Devillier")
batsmanScoringRateODTT("./devilliers.csv","Devillier")

dev.off()
## null device
##           1

C. Chris Gayle

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./gayle.csv","Gayle")
batsman6s("./gayle.csv","Gayle")
batsmanScoringRateODTT("./gayle.csv","Gayle")

dev.off()
## null device
##           1

D. Glenn Maxwell

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./maxwell.csv","Maxwell")
batsman6s("./maxwell.csv","Maxwell")
batsmanScoringRateODTT("./maxwell.csv","Maxwell")

dev.off()
## null device
##           1

Relative Mean Strike Rate

In this first plot I plot the Mean Strike Rate of the batsmen. It can be seen that Maxwell has a awesome strike rate in ODIs. However we need to keep in mind that Maxwell has relatively much fewer (only 45 innings) innings. He is followed by Sehwag who(most innings- 245) also has an excellent strike rate till 100 runs and then we have Devilliers who roars ahead. This is also seen in the overall strike rate in above

par(mar=c(4,4,2,2))
frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
relativeBatsmanSRODTT(frames,names)

Relative Runs Frequency Percentage

Sehwag leads in the percentage of runs in 10 run ranges upto 50 runs. Maxwell and Devilliers lead in 55-66 & 66-85 respectively.

frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
relativeRunsFreqPerfODTT(frames,names)

Percentage of 4s,6s in the runs scored

The plot below shows the percentage of runs made by the batsmen by ways of 1s,2s,3s, 4s and 6s. It can be seen that Sehwag has the higheest percent of 4s (33.36%) in his overall runs in ODIs. Maxwell has the highest percentage of 6s (13.36%) in his ODI career. If we take the overall 4s+6s then Sehwag leads with (33.36 +5.95 = 39.31%),followed by Gayle (27.80+10.15=37.95%)

Percent 4’s,6’s in total runs scored

The plot below shows the contrib

frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
runs4s6s <-batsman4s6s(frames,names)

print(runs4s6s)
##                Sehwag Devilliers Gayle Maxwell
## Runs(1s,2s,3s)  60.69      67.39 62.05   62.11
## 4s              33.36      24.28 27.80   24.53
## 6s               5.95       8.32 10.15   13.36
 

Runs forecast

The forecast for the batsman is shown below.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./sehwag.csv","Sehwag")
batsmanPerfForecast("./devilliers.csv","Devilliers")
batsmanPerfForecast("./gayle.csv","Gayle")
batsmanPerfForecast("./maxwell.csv","Maxwell")

dev.off()
## null device
##           1

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./sehwag.csv","V Sehwag")
battingPerf3d("./devilliers.csv","AB Devilliers")

dev.off()
## null device
##           1
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./gayle.csv","C Gayle")
battingPerf3d("./maxwell.csv","G Maxwell")

dev.off()
## null device
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

BF <- seq( 10, 200,length=10)
Mins <- seq(30,220,length=10)
newDF <- data.frame(BF,Mins)

sehwag <- batsmanRunsPredict("./sehwag.csv","Sehwag",newdataframe=newDF)
devilliers <- batsmanRunsPredict("./devilliers.csv","Devilliers",newdataframe=newDF)
gayle <- batsmanRunsPredict("./gayle.csv","Gayle",newdataframe=newDF)
maxwell <- batsmanRunsPredict("./maxwell.csv","Maxwell",newdataframe=newDF)

The fitted model is then used to predict the runs that the batsmen will score for a hypotheticial Balls faced and Minutes at crease. It can be seen that Maxwell sets a searing pace in the predicted runs for a given Balls Faced and Minutes at crease followed by Sehwag. But we have to keep in mind that Maxwell has only around 1/5th of the innings of Sehwag (45 to Sehwag’s 245 innings). They are followed by Devilliers and then finally Gayle

batsmen <-cbind(round(sehwag$Runs),round(devilliers$Runs),round(gayle$Runs),round(maxwell$Runs))
colnames(batsmen) <- c("Sehwag","Devilliers","Gayle","Maxwell")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Sehwag Devilliers Gayle Maxwell
## 1          10           30     11         12    11      18
## 2          31           51     33         32    28      43
## 3          52           72     55         52    46      67
## 4          73           93     77         71    63      92
## 5          94          114    100         91    81     117
## 6         116          136    122        111    98     141
## 7         137          157    144        130   116     166
## 8         158          178    167        150   133     191
## 9         179          199    189        170   151     215
## 10        200          220    211        190   168     240

Highest runs likelihood

The plots below the runs likelihood of batsman. This uses K-Means It can be seen that Devilliers has almost 27.75% likelihood to make around 90+ runs. Gayle and Sehwag have 34% to make 40+ runs. A. Virender Sehwag

A. Virender Sehwag

batsmanRunsLikelihood("./sehwag.csv","Sehwag")

## Summary of  Sehwag 's runs scoring likelihood
## **************************************************
##
## There is a 35.22 % likelihood that Sehwag  will make  46 Runs in  44 balls over 67  Minutes
## There is a 9.43 % likelihood that Sehwag  will make  119 Runs in  106 balls over  158  Minutes
## There is a 55.35 % likelihood that Sehwag  will make  12 Runs in  13 balls over 18  Minutes

B. AB Devilliers

batsmanRunsLikelihood("./devilliers.csv","Devilliers")

## Summary of  Devilliers 's runs scoring likelihood
## **************************************************
##
## There is a 30.65 % likelihood that Devilliers  will make  44 Runs in  43 balls over 60  Minutes
## There is a 29.84 % likelihood that Devilliers  will make  91 Runs in  88 balls over  124  Minutes
## There is a 39.52 % likelihood that Devilliers  will make  11 Runs in  15 balls over 21  Minutes

C. Chris Gayle

batsmanRunsLikelihood("./gayle.csv","Gayle")

## Summary of  Gayle 's runs scoring likelihood
## **************************************************
##
## There is a 32.69 % likelihood that Gayle  will make  47 Runs in  51 balls over 72  Minutes
## There is a 54.49 % likelihood that Gayle  will make  10 Runs in  15 balls over  20  Minutes
## There is a 12.82 % likelihood that Gayle  will make  109 Runs in  119 balls over 172  Minutes

D. Glenn Maxwell

batsmanRunsLikelihood("./maxwell.csv","Maxwell")

## Summary of  Maxwell 's runs scoring likelihood
## **************************************************
##
## There is a 34.38 % likelihood that Maxwell  will make  39 Runs in  29 balls over 35  Minutes
## There is a 15.62 % likelihood that Maxwell  will make  89 Runs in  55 balls over  69  Minutes
## There is a 50 % likelihood that Maxwell  will make  6 Runs in  7 balls over 9  Minutes

Average runs at ground and against opposition

A. Virender Sehwag

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./sehwag.csv","Sehwag")
batsmanAvgRunsOpposition("./sehwag.csv","Sehwag")

dev.off()
## null device
##           1

B. AB Devilliers

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./devilliers.csv","Devilliers")
batsmanAvgRunsOpposition("./devilliers.csv","Devilliers")

dev.off()
## null device
##           1

C. Chris Gayle

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./gayle.csv","Gayle")
batsmanAvgRunsOpposition("./gayle.csv","Gayle")

dev.off()
## null device
##           1

D. Glenn Maxwell

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./maxwell.csv","Maxwell")
batsmanAvgRunsOpposition("./maxwell.csv","Maxwell")

dev.off()
## null device
##           1

Moving Average of runs over career

The moving average for the 4 batsmen indicate the following

1. The moving average of Devilliers and Maxwell is on the way up.
2. Sehwag shows a slight downward trend from his 2nd peak in 2011
3. Gayle maintains a consistent 45 runs for the last few years

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./sehwag.csv","Sehwag")
batsmanMovingAverage("./devilliers.csv","Devilliers")
batsmanMovingAverage("./gayle.csv","Gayle")
batsmanMovingAverage("./maxwell.csv","Maxwell")

dev.off()
## null device
##           1

Check batsmen in-form, out-of-form

1. Maxwell, Devilliers, Sehwag are in-form. This is also evident from the moving average plot
2. Gayle is out-of-form
checkBatsmanInForm("./sehwag.csv","Sehwag")
## *******************************************************************************************
##
## Population size: 143  Mean of population: 33.76
## Sample size: 16  Mean of sample: 37.44 SD of sample: 55.15
##
## Null hypothesis H0 : Sehwag 's sample average is within 95% confidence interval
##         of population average
## Alternative hypothesis Ha : Sehwag 's sample average is below the 95% confidence
##         interval of population average
##
## [1] "Sehwag 's Form Status: In-Form because the p value: 0.603525  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./devilliers.csv","Devilliers")
## *******************************************************************************************
##
## Population size: 111  Mean of population: 43.5
## Sample size: 13  Mean of sample: 57.62 SD of sample: 40.69
##
## Null hypothesis H0 : Devilliers 's sample average is within 95% confidence interval
##         of population average
## Alternative hypothesis Ha : Devilliers 's sample average is below the 95% confidence
##         interval of population average
##
## [1] "Devilliers 's Form Status: In-Form because the p value: 0.883541  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./gayle.csv","Gayle")
## *******************************************************************************************
##
## Population size: 140  Mean of population: 37.1
## Sample size: 16  Mean of sample: 17.25 SD of sample: 20.25
##
## Null hypothesis H0 : Gayle 's sample average is within 95% confidence interval
##         of population average
## Alternative hypothesis Ha : Gayle 's sample average is below the 95% confidence
##         interval of population average
##
## [1] "Gayle 's Form Status: Out-of-Form because the p value: 0.000609  is less than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./maxwell.csv","Maxwell")
## *******************************************************************************************
##
## Population size: 28  Mean of population: 25.25
## Sample size: 4  Mean of sample: 64.25 SD of sample: 36.97
##
## Null hypothesis H0 : Maxwell 's sample average is within 95% confidence interval
##         of population average
## Alternative hypothesis Ha : Maxwell 's sample average is below the 95% confidence
##         interval of population average
##
## [1] "Maxwell 's Form Status: In-Form because the p value: 0.948744  is greater than alpha=  0.05"
## *******************************************************************************************

Analysis of bowlers

1. Mitchell Johnson (Aus) – Innings-150, Wickets – 239, Econ Rate : 4.83
2. Lasith Malinga (SL)- Innings-182, Wickets – 287, Econ Rate : 5.26
3. Dale Steyn (SA)- Innings-103, Wickets – 162, Econ Rate : 4.81
4. Tim Southee (NZ)- Innings-96, Wickets – 135, Econ Rate : 5.33

Malinga has the highest number of innings and wickets followed closely by Mitchell. Steyn and Southee have relatively fewer innings.

To get the bowler’s data use

malinga <- getPlayerDataOD(49758,dir=".",file="malinga.csv",type="bowling")

Wicket Frequency percentage

This plot gives the percentage of wickets for each wickets (1,2,3…etc)

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./mitchell.csv","J Mitchell")
bowlerWktsFreqPercent("./malinga.csv","Malinga")
bowlerWktsFreqPercent("./steyn.csv","Steyn")
bowlerWktsFreqPercent("./southee.csv","southee")

dev.off()
## null device
##           1

Wickets Runs plot

The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers. M Johnson and Steyn are more economical than Malinga and Southee corroborating the figures above

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))

bowlerWktsRunsPlot("./mitchell.csv","J Mitchell")
bowlerWktsRunsPlot("./malinga.csv","Malinga")
bowlerWktsRunsPlot("./steyn.csv","Steyn")
bowlerWktsRunsPlot("./southee.csv","southee")

dev.off()
## null device
##           1

Average wickets in different grounds and opposition

A. Mitchell Johnson

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./mitchell.csv","J Mitchell")
bowlerAvgWktsOpposition("./mitchell.csv","J Mitchell")

dev.off()
## null device
##           1

B. Lasith Malinga

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./malinga.csv","Malinga")
bowlerAvgWktsOpposition("./malinga.csv","Malinga")

dev.off()
## null device
##           1

C. Dale Steyn

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./steyn.csv","Steyn")
bowlerAvgWktsOpposition("./steyn.csv","Steyn")

dev.off()
## null device
##           1

D. Tim Southee

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./southee.csv","southee")
bowlerAvgWktsOpposition("./southee.csv","southee")

dev.off()
## null device
##           1

Relative bowling performance

The plot below shows that Mitchell Johnson and Southee have more wickets in 3-4 wickets range while Steyn and Malinga in 1-2 wicket range

frames <- list("./mitchell.csv","./malinga.csv","steyn.csv","southee.csv")
names <- list("M Johnson","Malinga","Steyn","Southee")
relativeBowlingPerf(frames,names)

Relative Economy Rate against wickets taken

Steyn had the best economy rate followed by M Johnson. Malinga and Southee have a poorer economy rate

frames <- list("./mitchell.csv","./malinga.csv","steyn.csv","southee.csv")
names <- list("M Johnson","Malinga","Steyn","Southee")
relativeBowlingERODTT(frames,names)

Moving average of wickets over career

Johnson and Steyn career vs wicket graph is on the up-swing. Southee is maintaining a reasonable record while Malinga shows a decline in ODI performance

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./mitchell.csv","M Johnson")
bowlerMovingAverage("./malinga.csv","Malinga")
bowlerMovingAverage("./steyn.csv","Steyn")
bowlerMovingAverage("./southee.csv","Southee")

dev.off()
## null device
##           1

Wickets forecast

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerPerfForecast("./mitchell.csv","M Johnson")
bowlerPerfForecast("./malinga.csv","Malinga")
bowlerPerfForecast("./steyn.csv","Steyn")
bowlerPerfForecast("./southee.csv","southee")

dev.off()
## null device
##           1

Check bowler in-form, out-of-form

All the bowlers are shown to be still in-form

checkBowlerInForm("./mitchell.csv","J Mitchell")
## *******************************************************************************************
##
## Population size: 135  Mean of population: 1.55
## Sample size: 15  Mean of sample: 2 SD of sample: 1.07
##
## Null hypothesis H0 : J Mitchell 's sample average is within 95% confidence interval
##         of population average
## Alternative hypothesis Ha : J Mitchell 's sample average is below the 95% confidence
##         interval of population average
##
## [1] "J Mitchell 's Form Status: In-Form because the p value: 0.937917  is greater than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./malinga.csv","Malinga")
## *******************************************************************************************
##
## Population size: 163  Mean of population: 1.58
## Sample size: 19  Mean of sample: 1.58 SD of sample: 1.22
##
## Null hypothesis H0 : Malinga 's sample average is within 95% confidence interval
##         of population average
## Alternative hypothesis Ha : Malinga 's sample average is below the 95% confidence
##         interval of population average
##
## [1] "Malinga 's Form Status: In-Form because the p value: 0.5  is greater than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./steyn.csv","Steyn")
## *******************************************************************************************
##
## Population size: 93  Mean of population: 1.59
## Sample size: 11  Mean of sample: 1.45 SD of sample: 0.69
##
## Null hypothesis H0 : Steyn 's sample average is within 95% confidence interval
##         of population average
## Alternative hypothesis Ha : Steyn 's sample average is below the 95% confidence
##         interval of population average
##
## [1] "Steyn 's Form Status: In-Form because the p value: 0.257438  is greater than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./southee.csv","southee")
## *******************************************************************************************
##
## Population size: 86  Mean of population: 1.48
## Sample size: 10  Mean of sample: 0.8 SD of sample: 1.14
##
## Null hypothesis H0 : southee 's sample average is within 95% confidence interval
##         of population average
## Alternative hypothesis Ha : southee 's sample average is below the 95% confidence
##         interval of population average
##
## [1] "southee 's Form Status: Out-of-Form because the p value: 0.044302  is less than alpha=  0.05"
## *******************************************************************************************

***************

Key findings

Here are some key conclusions ODI batsmen

1. AB Devilliers has high frequency of runs in the 60-120 range and the highest average
2. Sehwag has the most number of innings and good strike rate
3. Maxwell has the best strike rate but it should be kept in mind that he has 1/5 of the innings of Sehwag. We need to see how he progress further
4. Sehwag has the highest percentage of 4s in the runs scored, while Maxwell has the most 6s
5. For a hypothetical Balls Faced and Minutes at creases Maxwell will score the most runs followed by Sehwag
6. The moving average of indicates that the best is yet to come for Devilliers and Maxwell. Sehwag has a few more years in him while Gayle shows a decline in ODI performance and an out of form is indicated.

ODI bowlers

1. Malinga has the highest played the highest innings and also has the highest wickets though he has poor economy rate
2. M Johnson is the most effective in the 3-4 wicket range followed by Southee
3. M Johnson and Steyn has the best overall economy rate followed by Malinga and Steyn 4 M Johnson and Steyn’s career is on the up-swing,Southee maintains a steady consistent performance, while Malinga shows a downward trend

Hasta la vista! I’ll be back!
Watch this space!

Also see my other posts in R

You may also like

cricketr digs the Ashes!

Published in R bloggers: cricketr digs the Ashes

Introduction

In some circles the Ashes is considered the ‘mother of all cricketing battles’. But, being a staunch supporter of all things Indian, cricket or otherwise, I have to say that the Ashes pales in comparison against a India-Pakistan match. After all, what are a few frowns and raised eyebrows at the Ashes in comparison to the seething emotions and reckless exuberance of Indian fans.

Anyway, the Ashes are an interesting duel and I have decided to do some cricketing analysis using my R package cricketr. For this analysis I have chosen the top 2 batsman and top 2 bowlers from both the Australian and English sides.

Batsmen

1. Steven Smith (Aus) – Innings – 58 , Ave: 58.52, Strike Rate: 55.90
2. David Warner (Aus) – Innings – 76, Ave: 46.86, Strike Rate: 73.88
3. Alistair Cook (Eng) – Innings – 208 , Ave: 46.62, Strike Rate: 46.33
4. J E Root (Eng) – Innings – 53, Ave: 54.02, Strike Rate: 51.30

Bowlers

1. Mitchell Johnson (Aus) – Innings-131, Wickets – 299, Econ Rate : 3.28
2. Peter Siddle (Aus) – Innings – 104 , Wickets- 192, Econ Rate : 2.95
3. James Anderson (Eng) – Innings – 199 , Wickets- 406, Econ Rate : 3.05
4. Stuart Broad (Eng) – Innings – 148 , Wickets- 296, Econ Rate : 3.08

It is my opinion if any 2 of the 4 in either team click then they will be able to swing the match in favor of their team.

I have interspered the plots with a few comments. Feel free to draw your conclusions!

The analysis is included below. Note: This post has also been hosted at Rpubs as cricketr digs the Ashes!
You can also download this analysis as a PDF file from cricketr digs the Ashes!

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)

Analyses of Batsmen

The following plots gives the analysis of the 2 Australian and 2 English batsmen. It must be kept in mind that Cooks has more innings than all the rest put together. Smith has the best average, and Warner has the best strike rate

Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

batsmanPerfBoxHist("./smith.csv","S Smith")

batsmanPerfBoxHist("./warner.csv","D Warner")

batsmanPerfBoxHist("./cook.csv","A Cook")

batsmanPerfBoxHist("./root.csv","JE Root")

Plot os 4s, 6s and the type of dismissals

A. Steven Smith

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./smith.csv","S Smith")
batsman6s("./smith.csv","S Smith")
batsmanDismissals("./smith.csv","S Smith")

dev.off()
## null device
##           1

B. David Warner

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./warner.csv","D Warner")
batsman6s("./warner.csv","D Warner")
batsmanDismissals("./warner.csv","D Warner")

dev.off()
## null device
##           1

C. Alistair Cook

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./cook.csv","A Cook")
batsman6s("./cook.csv","A Cook")
batsmanDismissals("./cook.csv","A Cook")

dev.off()
## null device
##           1

D. J E Root

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./root.csv","JE Root")
batsman6s("./root.csv","JE Root")
batsmanDismissals("./root.csv","JE Root")

dev.off()
## null device
##           1

Relative Mean Strike Rate

In this first plot I plot the Mean Strike Rate of the batsmen. It can be Warner’s has the best strike rate (hit outside the plot!) followed by Smith in the range 20-100. Root has a good strike rate above hundred runs. Cook maintains a good strike rate.

par(mar=c(4,4,2,2))
frames <- list("./smith.csv","./warner.csv","cook.csv","root.csv")
names <- list("Smith","Warner","Cook","Root")
relativeBatsmanSR(frames,names)

Relative Runs Frequency Percentage

The plot below show the percentage contribution in each 10 runs bucket over the entire career.It can be seen that Smith pops up above the rest with remarkable regularity.COok is consistent over the entire range.

frames <- list("./smith.csv","./warner.csv","cook.csv","root.csv")
names <- list("Smith","Warner","Cook","Root")
relativeRunsFreqPerf(frames,names)

Moving Average of runs over career

The moving average for the 4 batsmen indicate the following 1. S Smith is the most promising. There is a marked spike in Performance. Cook maintains a steady pace and is consistent over the years averaging 50 over the years.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./smith.csv","S Smith")
batsmanMovingAverage("./warner.csv","D Warner")
batsmanMovingAverage("./cook.csv","A Cook")
batsmanMovingAverage("./root.csv","JE Root")

dev.off()
## null device
##           1

Runs forecast

The forecast for the batsman is shown below. As before Cooks’s performance is really consistent across the years and the forecast is good for the years ahead. In Cook’s case it can be seen that the forecasted and actual runs are reasonably accurate

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./smith.csv","S Smith")
batsmanPerfForecast("./warner.csv","D Warner")
batsmanPerfForecast("./cook.csv","A Cook")
## Warning in HoltWinters(ts.train): optimization difficulties: ERROR:
## ABNORMAL_TERMINATION_IN_LNSRCH
batsmanPerfForecast("./root.csv","JE Root")

dev.off()
## null device
##           1

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./smith.csv","S Smith")
battingPerf3d("./warner.csv","D Warner")

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./cook.csv","A Cook")
battingPerf3d("./root.csv","JE Root")

dev.off()
## null device
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
smith <- batsmanRunsPredict("./smith.csv","S Smith",newdataframe=newDF)
warner <- batsmanRunsPredict("./warner.csv","D Warner",newdataframe=newDF)
cook <- batsmanRunsPredict("./cook.csv","A Cook",newdataframe=newDF)
root <- batsmanRunsPredict("./root.csv","JE Root",newdataframe=newDF)

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease. It can be seen that Warner sets a searing pace in the predicted runs for a given Balls Faced and Minutes at crease while Smith and Root are neck to neck in the predicted runs

batsmen <-cbind(round(smith$Runs),round(warner$Runs),round(cook$Runs),round(root$Runs))
colnames(batsmen) <- c("Smith","Warner","Cook","Root")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Smith Warner Cook Root
## 1          10           30     9     12    6    9
## 2          38           71    25     33   20   25
## 3          66          111    42     53   33   42
## 4          94          152    58     73   47   59
## 5         121          193    75     93   60   75
## 6         149          234    91    114   74   92
## 7         177          274   108    134   88  109
## 8         205          315   124    154  101  125
## 9         233          356   141    174  115  142
## 10        261          396   158    195  128  159
## 11        289          437   174    215  142  175
## 12        316          478   191    235  155  192
## 13        344          519   207    255  169  208
## 14        372          559   224    276  182  225
## 15        400          600   240    296  196  242



Highest runs likelihood

The plots below the runs likelihood of batsman. This uses K-Means. It can be seen Smith has the best likelihood around 40% of scoring around 41 runs, followed by Root who has 28.3% likelihood of scoring around 81 runs

A. Steven Smith

batsmanRunsLikelihood("./smith.csv","S Smith")


## Summary of  S Smith 's runs scoring likelihood
## **************************************************
##
## There is a 40 % likelihood that S Smith  will make  41 Runs in  73 balls over 101  Minutes
## There is a 36 % likelihood that S Smith  will make  9 Runs in  21 balls over  27  Minutes
## There is a 24 % likelihood that S Smith  will make  139 Runs in  237 balls over 338  Minutes

B. David Warner

batsmanRunsLikelihood("./warner.csv","D Warner")


## Summary of  D Warner 's runs scoring likelihood
## **************************************************
##
## There is a 11.11 % likelihood that D Warner  will make  134 Runs in  159 balls over 263  Minutes
## There is a 63.89 % likelihood that D Warner  will make  17 Runs in  25 balls over  37  Minutes
## There is a 25 % likelihood that D Warner  will make  73 Runs in  105 balls over 156  Minutes

C. Alastair Cook

batsmanRunsLikelihood("./cook.csv","A Cook")


## Summary of  A Cook 's runs scoring likelihood
## **************************************************
##
## There is a 27.72 % likelihood that A Cook  will make  64 Runs in  140 balls over 195  Minutes
## There is a 59.9 % likelihood that A Cook  will make  15 Runs in  32 balls over  46  Minutes
## There is a 12.38 % likelihood that A Cook  will make  141 Runs in  300 balls over 420  Minutes

D. J E Root

batsmanRunsLikelihood("./root.csv","JE Root")


## Summary of  JE Root 's runs scoring likelihood
## **************************************************
##
## There is a 28.3 % likelihood that JE Root  will make  81 Runs in  158 balls over 223  Minutes
## There is a 7.55 % likelihood that JE Root  will make  179 Runs in  290 balls over  425  Minutes
## There is a 64.15 % likelihood that JE Root  will make  16 Runs in  39 balls over 59  Minutes
 

Average runs at ground and against opposition

A. Steven Smith

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./smith.csv","S Smith")
batsmanAvgRunsOpposition("./smith.csv","S Smith")

dev.off()
## null device
##           1

B. David Warner

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./warner.csv","D Warner")
batsmanAvgRunsOpposition("./warner.csv","D Warner")

dev.off()
## null device
##           1

C. Alistair Cook

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./cook.csv","A Cook")
batsmanAvgRunsOpposition("./cook.csv","A Cook")

dev.off()
## null device
##           1

D. J E Root

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./root.csv","JE Root")
batsmanAvgRunsOpposition("./root.csv","JE Root")

dev.off()
## null device
##           1

Analysis of bowlers

1. Mitchell Johnson (Aus) – Innings-131, Wickets – 299, Econ Rate : 3.28
2. Peter Siddle (Aus) – Innings – 104 , Wickets- 192, Econ Rate : 2.95
3. James Anderson (Eng) – Innings – 199 , Wickets- 406, Econ Rate : 3.05
4. Stuart Broad (Eng) – Innings – 148 , Wickets- 296, Econ Rate : 3.08

Anderson has the highest number of inning and wickets followed closely by Broad and Mitchell who are in a neck to neck race with respect to wickets. Johnson is on the more expensive side though. Siddle has fewer innings but a good economy rate.

Wicket Frequency percentage

This plot gives the percentage of wickets for each wickets (1,2,3…etc)

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./johnson.csv","Johnson")
bowlerWktsFreqPercent("./siddle.csv","Siddle")
bowlerWktsFreqPercent("./anderson.csv","Anderson")

dev.off()
## null device
##           1

Wickets Runs plot

The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsRunsPlot("./johnson.csv","Johnson")
bowlerWktsRunsPlot("./siddle.csv","Siddle")
bowlerWktsRunsPlot("./anderson.csv","Anderson")

dev.off()
## null device
##           1

Average wickets in different grounds and opposition

A. Mitchell Johnson

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./johnson.csv","Johnson")
bowlerAvgWktsOpposition("./johnson.csv","Johnson")

dev.off()
## null device
##           1

B. Peter Siddle

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./siddle.csv","Siddle")
bowlerAvgWktsOpposition("./siddle.csv","Siddle")

dev.off()
## null device
##           1

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsOpposition("./broad.csv","Broad")

dev.off()
## null device
##           1

D. James Anderson

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./anderson.csv","Anderson")
bowlerAvgWktsOpposition("./anderson.csv","Anderson")

dev.off()
## null device
##           1

Relative bowling performance

The plot below shows that Mitchell Johnson is the mopst effective bowler among the lot with a higher wickets in the 3-6 wicket range. Broad and Anderson seem to perform well in 2 wickets in comparison to Siddle but in 3 wickets Siddle is better than Broad and Anderson.

frames <- list("./johnson.csv","./siddle.csv","broad.csv","anderson.csv")
relativeBowlingPerf(frames,names)

Relative Economy Rate against wickets taken

Anderson followed by Siddle has the best economy rates. Johnson is fairly expensive in the 4-8 wicket range.

frames <- list("./johnson.csv","./siddle.csv","broad.csv","anderson.csv")
relativeBowlingER(frames,names)

Moving average of wickets over career

Johnson is on his second peak while Siddle is on the decline with respect to bowling. Broad and Anderson show improving performance over the years.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./johnson.csv","Johnson")
bowlerMovingAverage("./siddle.csv","Siddle")
bowlerMovingAverage("./anderson.csv","Anderson")

dev.off()
## null device
##           1

Wickets forecast

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerPerfForecast("./johnson.csv","Johnson")
bowlerPerfForecast("./siddle.csv","Siddle")
bowlerPerfForecast("./anderson.csv","Anderson")

dev.off()
## null device
##           1

Key findings

Here are some key conclusions

1. Cook has the most number of innings and has been extremly consistent in his scores
2. Warner has the best strike rate among the lot followed by Smith and Root
3. The moving average shows a marked improvement over the years for Smith
4. Johnson is the most effective bowler but is fairly expensive
5. Anderson has the best economy rate followed by Siddle
6. Johnson is at his second peak with respect to bowling while Broad and Anderson maintain a steady line and length in their career bowling performance

Also see my other posts in R

You may also like

Taking cricketr for a spin – Part 1

“Curiouser and curiouser!” cried Alice
“The time has come,” the walrus said, “to talk of many things: Of shoes and ships – and sealing wax – of cabbages and kings”
“Begin at the beginning,”the King said, very gravely,“and go on till you come to the end: then stop.”
“And what is the use of a book,” thought Alice, “without pictures or conversation?”

            Excerpts from Alice in Wonderland by Lewis Carroll

Introduction

This post is a continuation of my previous post “Introducing cricketr! A R package to analyze the performances of cricketers.” In this post I take my package cricketr for a spin. For this analysis I focus on the Indian batting legends

– Sachin Tendulkar (Master Blaster)
– Rahul Dravid (The Will)
– Sourav Ganguly ( The Dada Prince)
– Sunil Gavaskar (Little Master)

This post is also hosted on RPubs – cricketr-1

(Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar)

library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)


Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency The plot below indicate the Tendulkar’s average is the highest. He is followed by Dravid, Gavaskar and then Ganguly

batsmanPerfBoxHist("./tendulkar.csv","Sachin Tendulkar")



batsmanPerfBoxHist("./dravid.csv","Rahul Dravid")



batsmanPerfBoxHist("./ganguly.csv","Sourav Ganguly")



batsmanPerfBoxHist("./gavaskar.csv","Sunil Gavaskar")

Relative Mean Strike Rate

In this first plot I plot the Mean Strike Rate of the batsmen. Tendulkar leads in the Mean Strike Rate for each runs in the range 100- 180. Ganguly has a very good Mean Strike Rate for runs range 40 -80

frames <- list("./tendulkar.csv","./dravid.csv","ganguly.csv","gavaskar.csv")
relativeBatsmanSR(frames,names)

Relative Runs Frequency Percentage

The plot below show the percentage contribution in each 10 runs bucket over the entire career.The percentage Runs Frequency is fairly close but Gavaskar seems to lead most of the way

frames <- list("./tendulkar.csv","./dravid.csv","ganguly.csv","gavaskar.csv")
relativeRunsFreqPerf(frames,names)

Moving Average of runs over career

The moving average for the 4 batsmen indicate the following – Tendulkar and Ganguly’s career has a downward trend and their retirement didn’t come too soon – Dravid and Gavaskar’s career definitely shows an upswing. They probably had a year or two left.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./tendulkar.csv","Tendulkar")
batsmanMovingAverage("./dravid.csv","Dravid")
batsmanMovingAverage("./ganguly.csv","Ganguly")
batsmanMovingAverage("./gavaskar.csv","Gavaskar")

dev.off()
## null device
##           1

Runs forecast

The forecast for the batsman is shown below. The plots indicate that only Tendulkar seemed to maintain a consistency over the period while the rest seem to score less than their forecasted runs in the last 10% of the career

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfForecast("./dravid.csv","Rahul Dravid")
batsmanPerfForecast("./ganguly.csv","Sourav Ganguly")
batsmanPerfForecast("./gavaskar.csv","Sunil Gavaskar")

dev.off()
## null device
##           1

Check for batsman in-form/out-of-form

The following snippet checks whether the batsman is in-inform or ouyt-of-form during the last 10% innings of the career. This is done by choosing the null hypothesis (h0) to indicate that the batsmen are in-form. Ha is the alternative hypothesis that they are not-in-form. The population is based on the 1st 90% of career runs. The last 10% is taken as the sample and a check is made on the lower tail to see if the sample mean is less than 95% confidence interval. If this difference is >0.05 then the batsman is considered out-of-form.

The computation show that Tendulkar was out-of-form while the other’s weren’t. While Dravid and Gavaskar’s moving average do show an upward trend the surprise is Ganguly. This could be that Ganguly was able to keep his average in the last 10% to with the 95$confidence interval. It has to be noted that Ganguly’s average was much lower than Tendulkar checkBatsmanInForm("./tendulkar.csv","Tendulkar") ## ******************************************************************************************* ## ## Population size: 294 Mean of population: 50.48 ## Sample size: 33 Mean of sample: 32.42 SD of sample: 29.8 ## ## Null hypothesis H0 : Tendulkar 's sample average is within 95% confidence interval ## of population average ## Alternative hypothesis Ha : Tendulkar 's sample average is below the 95% confidence ## interval of population average ## ## [1] "Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713 is less than alpha= 0.05" ## ******************************************************************************************* checkBatsmanInForm("./dravid.csv","Dravid") ## ******************************************************************************************* ## ## Population size: 256 Mean of population: 46.98 ## Sample size: 29 Mean of sample: 43.48 SD of sample: 40.89 ## ## Null hypothesis H0 : Dravid 's sample average is within 95% confidence interval ## of population average ## Alternative hypothesis Ha : Dravid 's sample average is below the 95% confidence ## interval of population average ## ## [1] "Dravid 's Form Status: In-Form because the p value: 0.324138 is greater than alpha= 0.05" ## ******************************************************************************************* checkBatsmanInForm("./ganguly.csv","Ganguly") ## ******************************************************************************************* ## ## Population size: 169 Mean of population: 38.94 ## Sample size: 19 Mean of sample: 33.21 SD of sample: 32.97 ## ## Null hypothesis H0 : Ganguly 's sample average is within 95% confidence interval ## of population average ## Alternative hypothesis Ha : Ganguly 's sample average is below the 95% confidence ## interval of population average ## ## [1] "Ganguly 's Form Status: In-Form because the p value: 0.229006 is greater than alpha= 0.05" ## ******************************************************************************************* checkBatsmanInForm("./gavaskar.csv","Gavaskar") ## ******************************************************************************************* ## ## Population size: 125 Mean of population: 44.67 ## Sample size: 14 Mean of sample: 57.86 SD of sample: 58.55 ## ## Null hypothesis H0 : Gavaskar 's sample average is within 95% confidence interval ## of population average ## Alternative hypothesis Ha : Gavaskar 's sample average is below the 95% confidence ## interval of population average ## ## [1] "Gavaskar 's Form Status: In-Form because the p value: 0.793276 is greater than alpha= 0.05" ## ******************************************************************************************* dev.off() ## null device ## 1 3D plot of Runs vs Balls Faced and Minutes at Crease The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) battingPerf3d("./tendulkar.csv","Tendulkar") battingPerf3d("./dravid.csv","Dravid") par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) battingPerf3d("./ganguly.csv","Ganguly") battingPerf3d("./gavaskar.csv","Gavaskar") dev.off() ## null device ## 1 Predicting Runs given Balls Faced and Minutes at Crease A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease. BF <- seq( 10, 400,length=15) Mins <- seq(30,600,length=15) newDF <- data.frame(BF,Mins) tendulkar <- batsmanRunsPredict("./tendulkar.csv","Tendulkar",newdataframe=newDF) dravid <- batsmanRunsPredict("./dravid.csv","Dravid",newdataframe=newDF) ganguly <- batsmanRunsPredict("./ganguly.csv","Ganguly",newdataframe=newDF) gavaskar <- batsmanRunsPredict("./gavaskar.csv","Gavaskar",newdataframe=newDF) The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease. It can be seen Tendulkar has a much higher Runs scored than all of the others. Tendulkar is followed by Ganguly who we saw earlier had a very good strike rate. However it must be noted that Dravid and Gavaskar have a better average. batsmen <-cbind(round(tendulkar$Runs),round(dravid$Runs),round(ganguly$Runs),round(gavaskar$Runs)) colnames(batsmen) <- c("Tendulkar","Dravid","Ganguly","Gavaskar") newDF <- data.frame(round(newDF$BF),round(newDF$Mins)) colnames(newDF) <- c("BallsFaced","MinsAtCrease") predictedRuns <- cbind(newDF,batsmen) predictedRuns ## BallsFaced MinsAtCrease Tendulkar Dravid Ganguly Gavaskar ## 1 10 30 7 1 7 4 ## 2 38 71 23 14 21 17 ## 3 66 111 39 27 35 30 ## 4 94 152 54 40 50 43 ## 5 121 193 70 54 64 56 ## 6 149 234 86 67 78 69 ## 7 177 274 102 80 93 82 ## 8 205 315 118 94 107 95 ## 9 233 356 134 107 121 108 ## 10 261 396 150 120 136 121 ## 11 289 437 165 134 150 134 ## 12 316 478 181 147 165 147 ## 13 344 519 197 160 179 160 ## 14 372 559 213 173 193 173 ## 15 400 600 229 187 208 186 Contribution to matches won and lost par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanContributionWonLost(35320,"Tendulkar") batsmanContributionWonLost(28114,"Dravid") batsmanContributionWonLost(28779,"Ganguly") batsmanContributionWonLost(28794,"Gavaskar") Home and overseas performance From the plot below Tendulkar and Dravid have a lot more matches both home and abroad and their performance has good both at home and overseas. Tendulkar has the best performance home and abroad and is consistent all across. Dravid is also cossistent at all venues. Gavaskar played fewer matches than Tendulkar & Dravid. The range of runs at home is higher than overseas, however the average is consistent both at home and abroad. Finally we have Ganguly. par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanPerfHomeAway(35320,"Tendulkar") batsmanPerfHomeAway(28114,"Dravid") batsmanPerfHomeAway(28779,"Ganguly") batsmanPerfHomeAway(28794,"Gavaskar")  Average runs at ground and against opposition Tendulkar has above 50 runs average against Sri Lanka, Bangladesh, West Indies and Zimbabwe. The performance against Australia and England average very close to 50. Sydney, Port Elizabeth, Bloemfontein, Collombo are great huntings grounds for Tendulkar par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) batsmanAvgRunsGround("./tendulkar.csv","Tendulkar") batsmanAvgRunsOpposition("./tendulkar.csv","Tendulkar")  dev.off() ## null device ## 1 Dravid plundered runs at Adelaide, Georgetown, Oval, Hamiltom etc. Dravid has above average against England, Bangaldesh, New Zealand, Pakistan, West Indies and Zimbabwe par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) batsmanAvgRunsGround("./dravid.csv","Dravid") batsmanAvgRunsOpposition("./dravid.csv","Dravid")  dev.off() ## null device ## 1 Ganguly has good performance at the Oval, Rawalpindi, Johannesburg and Kandy. Ganguly averages 50 runs against England and Bangladesh. par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) batsmanAvgRunsGround("./ganguly.csv","Ganguly") batsmanAvgRunsOpposition("./ganguly.csv","Ganguly")  dev.off() ## null device ## 1 The Oval, Sydney, Perth, Melbourne, Brisbane, Manchester are happy hunting grounds for Gavaskar. Gavaskar averages around 50 runs Australia, Pakistan, Sri Lanka, West Indies. par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) batsmanAvgRunsGround("./gavaskar.csv","Gavaskar") batsmanAvgRunsOpposition("./gavaskar.csv","Gavaskar")  dev.off() ## null device ## 1 Key findings Here are some key conclusions 1. Tendulkar has the highest average among the 4. He is followed by Dravid, Gavaskar and Ganguly. 2. Tendulkar’s predicted performance for a given number of Balls Faced and Minutes at Crease is superior to the rest 3. Dravid averages above 50 against 6 countries 4. West Indies and Australia are Gavaskar’s favorite batting grounds 5. Ganguly has a very good Mean Strike Rate for the range 40-80 and Tendulkar from 100-180 6. In home and overseas performance, Tendulkar is the best. Dravid and Gavaskar also have good performance overseas. 7. Dravid and Gavaskar probably retired a year or two earlier while Tendulkar and Ganguly’s time was clearly up Final thoughts Tendulkar is clearly the greatest batsman India has produced as he leads in almost all aspects of batting – number of centuries, strike rate, predicted runs and home and overseas performance. Dravid follows Tendulkar with 48 centuries, consistent performance home and overseas and a career that was still green. Gavaskar has fewer matches than rest but his performance overseas is very good in those helmetless times. Finally we have Ganguly. Dravid and Gavaskar had a few more years of great batting while Tendulkar and Ganguly’s career was on a decline. Note:It is really not fair to include Gavaskar in the analysis as he played in a different era when helmets were not used, even against the fiery pace of Thomson, Lillee, Roberts, Holding etc. In addition Gavaskar did not play against some of the newer countries like Bangladesh and Zimbabwe where he could have amassed runs. Yet I wanted to include him and his performance is clearly excellent Also see my other posts in R You may also like Introducing cricketr! : An R package to analyze performances of cricketers Published in R bloggers: Introducing cricketr: An R package to analyze the performances of cricketers Yet all experience is an arch wherethro’ Gleams that untravell’d world whose margin fades For ever and forever when I move. How dull it is to pause, to make an end, To rust unburnish’d, not to shine in use! Ulysses by Alfred Tennyson Introduction This is an initial post in which I introduce a cricketing package ‘cricketr’ which I have created. This package was a natural culmination to my earlier posts on cricket and my completing 9 modules of Data Science Specialization, from John Hopkins University at Coursera. The thought of creating this package struck me some time back, and I have finally been able to bring this to fruition. So here it is. My R package ‘cricketr!!!’ This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package only uses data from test cricket. I plan to develop functionality for One-day and Twenty20 cricket later. You should be able to install the package from GitHub and use many of the functions available in the package. Please be mindful of ESPN Cricinfo Terms of Use (Note: This page is also hosted as a GitHub page at cricketr and also at RPubs as cricketr: A R package for analyzing performances of cricketers You can download this analysis as a PDF file from Introducing cricketr The cricketr package The cricketr package has several functions that perform several different analyses on both batsman and bowlers. The package has functions that plot percentage frequency runs or wickets, runs likelihood for a batsman, relative run/strike rates of batsman and relative performance/economy rate for bowlers are available. Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar Other interesting functions include batting performance moving average, forecast and a function to check whether the batsman/bowler is in in-form or out-of-form. The data for a particular player can be obtained with the getPlayerData() function from the package. To do this you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Ricky Ponting, Sachin Tendulkar etc. This will bring up a page which have the profile number for the player e.g. for Sachin Tendulkar this would be http://www.espncricinfo.com/india/content/player/35320.html. Hence, Sachin’s profile is 35320. This can be used to get the data for Tendulkar as shown below The cricketr package can be installed from GitHub with library(devtools) install_github("tvganesh/cricketr") library(cricketr) tendulkar <- getPlayerData(35320,dir="..",file="tendulkar.csv",type="batting",homeOrAway=c(1,2), result=c(1,2,4)) Important Note This needs to be done only once for a player. This function stores the player’s data in a CSV file (for e.g. tendulkar.csv as above) which can then be reused for all other functions. Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerData for all subsequent analyses Sachin Tendulkar’s performance – Basic Analyses The 3 plots below provide the following for Tendulkar 1. Frequency percentage of runs in each run range over the whole career 2. Mean Strike Rate for runs scored in the given range 3. A histogram of runs frequency percentages in runs ranges par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) batsmanRunsFreqPerf("./tendulkar.csv","Sachin Tendulkar") batsmanMeanStrikeRate("./tendulkar.csv","Sachin Tendulkar") batsmanRunsRanges("./tendulkar.csv","Sachin Tendulkar") dev.off() ## null device ## 1  More analyses par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) batsman4s("./tendulkar.csv","Tendulkar") batsman6s("./tendulkar.csv","Tendulkar") batsmanDismissals("./tendulkar.csv","Tendulkar")   3D scatter plot and prediction plane The plots below show the 3D scatter plot of Sachin’s Runs versus Balls Faced and Minutes at crease. A linear regression model is then fitted between Runs and Balls Faced + Minutes at crease battingPerf3d("./tendulkar.csv","Sachin Tendulkar") Average runs at different venues The plot below gives the average runs scored by Tendulkar at different grounds. The plot also displays the number of innings at each ground as a label at x-axis. It can be seen Tendulkar did great in Colombo (SSC), Melbourne ifor matches overseas and Mumbai, Mohali and Bangalore at home batsmanAvgRunsGround("./tendulkar.csv","Sachin Tendulkar")  Average runs against different opposing teams This plot computes the average runs scored by Tendulkar against different countries. The x-axis also gives the number of innings against each team batsmanAvgRunsOpposition("./tendulkar.csv","Tendulkar")  Highest Runs Likelihood The plot below shows the Runs Likelihood for a batsman. For this the performance of Sachin is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease using. K-Means. The centroids of 3 clusters are computed and plotted. In this plot. Sachin Tendulkar’s highest tendencies are computed and plotted using K-Means batsmanRunsLikelihood("./tendulkar.csv","Sachin Tendulkar") ## Summary of Sachin Tendulkar 's runs scoring likelihood ## ************************************************** ## ## There is a 16.51 % likelihood that Sachin Tendulkar will make 139 Runs in 251 balls over 353 Minutes ## There is a 58.41 % likelihood that Sachin Tendulkar will make 16 Runs in 31 balls over 44 Minutes ## There is a 25.08 % likelihood that Sachin Tendulkar will make 66 Runs in 122 balls over 167 Minutes A look at the Top 4 batsman – Tendulkar, Kallis, Ponting and Sangakkara The batsmen with the most hundreds in test cricket are 1. Sachin Tendulkar :Average:53.78,100’s – 51, 50’s – 68 2. Jacques Kallis : Average: 55.47, 100’s – 45, 50’s – 58 3. Ricky Ponting : Average: 51.85, 100’s – 41 , 50’s – 62 4. Kumara Sangakarra: Average: 58.04 ,100’s – 38 , 50’s – 52 in that order. The following plots take a closer at their performances. The box plots show the mean (red line) and median (blue line). The two ends of the boxplot display the 25th and 75th percentile. Box Histogram Plot This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency. The calculated Mean differ from the stated means possibly because of data cleaning. Also not sure how the means were arrived at ESPN Cricinfo for e.g. when considering not out.. batsmanPerfBoxHist("./tendulkar.csv","Sachin Tendulkar") batsmanPerfBoxHist("./kallis.csv","Jacques Kallis") batsmanPerfBoxHist("./ponting.csv","Ricky Ponting") batsmanPerfBoxHist("./sangakkara.csv","K Sangakkara") Contribution to won and lost matches The plot below shows the contribution of Tendulkar, Kallis, Ponting and Sangakarra in matches won and lost. The plots show the range of runs scored as a boxplot (25th & 75th percentile) and the mean scored. The total matches won and lost are also printed in the plot. All the players have scored more in the matches they won than the matches they lost. Ricky Ponting is the only batsman who seems to have more matches won to his credit than others. This could also be because he was a member of strong Australian team par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanContributionWonLost("35320","Sachin Tendulkar") batsmanContributionWonLost("45789","Jacques Kallis") batsmanContributionWonLost("7133","Ricky Ponting") batsmanContributionWonLost("50710","K Sangakarra") dev.off() ## null device ## 1  Performance at home and overseas From the plot below it can be seen Tendulkar has more matches overseas than at home and his performance is consistent in all venues at home or abroad. Ponting has lesser innings than Tendulkar and has an equally good performance at home and overseas.Kallis and Sangakkara’s performance abroad is lower than the performance at home. par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanPerfHomeAway("35320","Tendulkar") batsmanPerfHomeAway("45789","Kallis") batsmanPerfHomeAway("7133","Ponting") batsmanPerfHomeAway("50710","Sangakarra")  dev.off() ## null device ## 1   Relative Mean Strike Rate plot The plot below compares the Mean Strike Rate of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following Range 0 – 50 Runs – Ponting leads followed by Tendulkar Range 50 -100 Runs – Ponting followed by Sangakkara Range 100 – 150 – Ponting and then Tendulkar frames <- list("./tendulkar.csv","./kallis.csv","ponting.csv","sangakkara.csv") names <- list("Tendulkar","Kallis","Ponting","Sangakkara") relativeBatsmanSR(frames,names) Relative Runs Frequency plot The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show Sangakkara leads followed by Ponting frames <- list("./tendulkar.csv","./kallis.csv","ponting.csv","sangakkara.csv") names <- list("Tendulkar","Kallis","Ponting","Sangakkara") relativeRunsFreqPerf(frames,names) Moving Average of runs in career Take a look at the Moving Average across the career of the Top 4. Clearly . Kallis and Sangakkara have a few more years of great batting ahead. They seem to average on 50. . Tendulkar and Ponting definitely show a slump in the later years par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanMovingAverage("./tendulkar.csv","Sachin Tendulkar") batsmanMovingAverage("./kallis.csv","Jacques Kallis") batsmanMovingAverage("./ponting.csv","Ricky Ponting") batsmanMovingAverage("./sangakkara.csv","K Sangakkara") dev.off() ## null device ## 1 Future Runs forecast Here are plots that forecast how the batsman will perform in future. In this case 90% of the career runs trend is uses as the training set. the remaining 10% is the test set. A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated runs trend is plotted. The test set is also plotted to see how close the forecast and the actual matches Take a look at the runs forecasted for the batsman below. • Tendulkar’s forecasted performance seems to tally with his actual performance with an average of 50 • Kallis the forecasted runs are higher than the actual runs he scored • Ponting seems to have a good run in the future • Sangakkara has a decent run in the future averaging 50 runs par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanPerfForecast("./tendulkar.csv","Sachin Tendulkar") batsmanPerfForecast("./kallis.csv","Jacques Kallis") batsmanPerfForecast("./ponting.csv","Ricky Ponting") batsmanPerfForecast("./sangakkara.csv","K Sangakkara") dev.off() ## null device ## 1 Check Batsman In-Form or Out-of-Form The below computation uses Null Hypothesis testing and p-value to determine if the batsman is in-form or out-of-form. For this 90% of the career runs is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated. The Null Hypothesis (H0) assumes that the batsman continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the batsman is out of form the sample mean is beyond the 95% confidence interval of the population mean. A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later This is done for the Top 4 batsman checkBatsmanInForm("./tendulkar.csv","Sachin Tendulkar") ## ******************************************************************************************* ## ## Population size: 294 Mean of population: 50.48 ## Sample size: 33 Mean of sample: 32.42 SD of sample: 29.8 ## ## Null hypothesis H0 : Sachin Tendulkar 's sample average is within 95% confidence interval ## of population average ## Alternative hypothesis Ha : Sachin Tendulkar 's sample average is below the 95% confidence ## interval of population average ## ## [1] "Sachin Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713 is less than alpha= 0.05" ## ******************************************************************************************* checkBatsmanInForm("./kallis.csv","Jacques Kallis") ## ******************************************************************************************* ## ## Population size: 240 Mean of population: 47.5 ## Sample size: 27 Mean of sample: 47.11 SD of sample: 59.19 ## ## Null hypothesis H0 : Jacques Kallis 's sample average is within 95% confidence interval ## of population average ## Alternative hypothesis Ha : Jacques Kallis 's sample average is below the 95% confidence ## interval of population average ## ## [1] "Jacques Kallis 's Form Status: In-Form because the p value: 0.48647 is greater than alpha= 0.05" ## ******************************************************************************************* checkBatsmanInForm("./ponting.csv","Ricky Ponting") ## ******************************************************************************************* ## ## Population size: 251 Mean of population: 47.5 ## Sample size: 28 Mean of sample: 36.25 SD of sample: 48.11 ## ## Null hypothesis H0 : Ricky Ponting 's sample average is within 95% confidence interval ## of population average ## Alternative hypothesis Ha : Ricky Ponting 's sample average is below the 95% confidence ## interval of population average ## ## [1] "Ricky Ponting 's Form Status: In-Form because the p value: 0.113115 is greater than alpha= 0.05" ## ******************************************************************************************* checkBatsmanInForm("./sangakkara.csv","K Sangakkara") ## ******************************************************************************************* ## ## Population size: 193 Mean of population: 51.92 ## Sample size: 22 Mean of sample: 71.73 SD of sample: 82.87 ## ## Null hypothesis H0 : K Sangakkara 's sample average is within 95% confidence interval ## of population average ## Alternative hypothesis Ha : K Sangakkara 's sample average is below the 95% confidence ## interval of population average ## ## [1] "K Sangakkara 's Form Status: In-Form because the p value: 0.862862 is greater than alpha= 0.05" ## ******************************************************************************************* 3D plot of Runs vs Balls Faced and Minutes at Crease The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) battingPerf3d("./tendulkar.csv","Tendulkar") battingPerf3d("./kallis.csv","Kallis") par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) battingPerf3d("./ponting.csv","Ponting") battingPerf3d("./sangakkara.csv","Sangakkara") dev.off() ## null device ## 1 Predicting Runs given Balls Faced and Minutes at Crease A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease. A sample sequence of Balls Faced(BF) and Minutes at crease (Mins) is setup as shown below. The fitted model is used to predict the runs for these values BF <- seq( 10, 400,length=15) Mins <- seq(30,600,length=15) newDF <- data.frame(BF,Mins) tendulkar <- batsmanRunsPredict("./tendulkar.csv","Tendulkar",newdataframe=newDF) kallis <- batsmanRunsPredict("./kallis.csv","Kallis",newdataframe=newDF) ponting <- batsmanRunsPredict("./ponting.csv","Ponting",newdataframe=newDF) sangakkara <- batsmanRunsPredict("./sangakkara.csv","Sangakkara",newdataframe=newDF) The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease. It can be seen Ponting has the will score the highest for a given Balls Faced and Minutes at crease. Ponting is followed by Tendulkar who has Sangakkara close on his heels and finally we have Kallis. This is intuitive as we have already seen that Ponting has a highest strike rate. batsmen <-cbind(round(tendulkar$Runs),round(kallis$Runs),round(ponting$Runs),round(sangakkara$Runs)) colnames(batsmen) <- c("Tendulkar","Kallis","Ponting","Sangakkara") newDF <- data.frame(round(newDF$BF),round(newDF$Mins)) colnames(newDF) <- c("BallsFaced","MinsAtCrease") predictedRuns <- cbind(newDF,batsmen) predictedRuns ## BallsFaced MinsAtCrease Tendulkar Kallis Ponting Sangakkara ## 1 10 30 7 6 9 2 ## 2 38 71 23 20 25 18 ## 3 66 111 39 34 42 34 ## 4 94 152 54 48 59 50 ## 5 121 193 70 62 76 66 ## 6 149 234 86 76 93 82 ## 7 177 274 102 90 110 98 ## 8 205 315 118 104 127 114 ## 9 233 356 134 118 144 130 ## 10 261 396 150 132 161 146 ## 11 289 437 165 146 178 162 ## 12 316 478 181 159 194 178 ## 13 344 519 197 173 211 194 ## 14 372 559 213 187 228 210 ## 15 400 600 229 201 245 226 Analysis of Top 3 wicket takers The top 3 wicket takes in test history are 1. M Muralitharan:Wickets: 800, Average = 22.72, Economy Rate – 2.47 2. Shane Warne: Wickets: 708, Average = 25.41, Economy Rate – 2.65 3. Anil Kumble: Wickets: 619, Average = 29.65, Economy Rate – 2.69 How do Anil Kumble, Shane Warne and M Muralitharan compare with one another with respect to wickets taken and the Economy Rate. The next set of plots compute and plot precisely these analyses. Wicket Frequency Plot This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) bowlerWktsFreqPercent("./kumble.csv","Anil Kumble") bowlerWktsFreqPercent("./warne.csv","Shane Warne") bowlerWktsFreqPercent("./murali.csv","M Muralitharan") dev.off() ## null device ## 1  Wickets Runs plot par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) bowlerWktsRunsPlot("./kumble.csv","Kumble") bowlerWktsRunsPlot("./warne.csv","Warne") bowlerWktsRunsPlot("./murali.csv","Muralitharan")  dev.off() ## null device ## 1 Average wickets at different venues The plot gives the average wickets taken by Muralitharan at different venues. Muralitharan has taken an average of 8 and 6 wickets at Oval & Wellington respectively in 2 different innings. His best performances are at Kandy and Colombo (SSC) bowlerAvgWktsGround("./murali.csv","Muralitharan") Average wickets against different opposition The plot gives the average wickets taken by Muralitharan against different countries. The x-axis also includes the number of innings against each team bowlerAvgWktsOpposition("./murali.csv","Muralitharan")  Relative Wickets Frequency Percentage The Relative Wickets Percentage plot shows that M Muralitharan has a large percentage of wickets in the 3-8 wicket range frames <- list("./kumble.csv","./murali.csv","warne.csv") names <- list("Anil KUmble","M Muralitharan","Shane Warne") relativeBowlingPerf(frames,names) Relative Economy Rate against wickets taken Clearly from the plot below it can be seen that Muralitharan has the best Economy Rate among the three frames <- list("./kumble.csv","./murali.csv","warne.csv") names <- list("Anil KUmble","M Muralitharan","Shane Warne") relativeBowlingER(frames,names) Wickets taken moving average From th eplot below it can be see 1. Shane Warne’s performance at the time of his retirement was still at a peak of 3 wickets 2. M Muralitharan seems to have become ineffective over time with his peak years being 2004-2006 3. Anil Kumble also seems to slump down and become less effective. par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) bowlerMovingAverage("./kumble.csv","Anil Kumble") bowlerMovingAverage("./warne.csv","Shane Warne") bowlerMovingAverage("./murali.csv","M Muralitharan") dev.off() ## null device ## 1  Future Wickets forecast Here are plots that forecast how the bowler will perform in future. In this case 90% of the career wickets trend is used as the training set. the remaining 10% is the test set. A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated wickets trend is plotted. The test set is also plotted to see how close the forecast and the actual matches Take a look at the wickets forecasted for the bowlers below. – Shane Warne and Muralitharan have a fairly consistent forecast – Kumble forecast shows a small dip par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) bowlerPerfForecast("./kumble.csv","Anil Kumble") bowlerPerfForecast("./warne.csv","Shane Warne") bowlerPerfForecast("./murali.csv","M Muralitharan") dev.off() ## null device ## 1 Contribution to matches won and lost The plot below is extremely interesting 1. Kumble wickets range from 2 to 4 wickets in matches wons with a mean of 3 2. Warne wickets in won matches range from 1 to 4 with more matches won. Clearly there are other bowlers contributing to the wins, possibly the pacers 3. Muralitharan wickets range in winning matches is more than the other 2 and ranges ranges 3 to 5 and clearly had a hand (pun unintended) in Sri Lanka’s wins par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) bowlerContributionWonLost(30176,"Anil Kumble") bowlerContributionWonLost(8166,"Shane Warne") bowlerContributionWonLost(49636,"M Muralitharan") dev.off() ## null device ## 1  Performance home and overseas From the plot below it can be seen that Kumble & Warne have played more matches overseas than Muralitharan. Both Kumble and Warne show an average of 2 wickers overseas, Murali on the other hand has an average of 2.5 wickets overseas but a slightly less number of matches than Kumble & Warne par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) bowlerPerfHomeAway(30176,"Kumble") bowlerPerfHomeAway(8166,"Warne") bowlerPerfHomeAway(49636,"Murali")  dev.off() ## null device ## 1   Check for bowler in-form/out-of-form The below computation uses Null Hypothesis testing and p-value to determine if the bowler is in-form or out-of-form. For this 90% of the career wickets is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated. The Null Hypothesis (H0) assumes that the bowler continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the bowler is out of form the sample mean is beyond the 95% confidence interval of the population mean. A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later Note: The check for the form status of the bowlers indicate 1. That both Kumble and Muralitharan were out of form. This also shows in the moving average plot 2. Warne is still in great form and could have continued for a few more years. Too bad we didn’t see the magic later checkBowlerInForm("./kumble.csv","Anil Kumble") ## ******************************************************************************************* ## ## Population size: 212 Mean of population: 2.69 ## Sample size: 24 Mean of sample: 2.04 SD of sample: 1.55 ## ## Null hypothesis H0 : Anil Kumble 's sample average is within 95% confidence interval ## of population average ## Alternative hypothesis Ha : Anil Kumble 's sample average is below the 95% confidence ## interval of population average ## ## [1] "Anil Kumble 's Form Status: Out-of-Form because the p value: 0.02549 is less than alpha= 0.05" ## ******************************************************************************************* checkBowlerInForm("./warne.csv","Shane Warne") ## ******************************************************************************************* ## ## Population size: 240 Mean of population: 2.55 ## Sample size: 27 Mean of sample: 2.56 SD of sample: 1.8 ## ## Null hypothesis H0 : Shane Warne 's sample average is within 95% confidence interval ## of population average ## Alternative hypothesis Ha : Shane Warne 's sample average is below the 95% confidence ## interval of population average ## ## [1] "Shane Warne 's Form Status: In-Form because the p value: 0.511409 is greater than alpha= 0.05" ## ******************************************************************************************* checkBowlerInForm("./murali.csv","M Muralitharan") ## ******************************************************************************************* ## ## Population size: 207 Mean of population: 3.55 ## Sample size: 23 Mean of sample: 2.87 SD of sample: 1.74 ## ## Null hypothesis H0 : M Muralitharan 's sample average is within 95% confidence interval ## of population average ## Alternative hypothesis Ha : M Muralitharan 's sample average is below the 95% confidence ## interval of population average ## ## [1] "M Muralitharan 's Form Status: Out-of-Form because the p value: 0.036828 is less than alpha= 0.05" ## ******************************************************************************************* dev.off() ## null device ## 1 Key Findings The plots above capture some of the capabilities and features of my cricketr package. Feel free to install the package and try it out. Please do keep in mind ESPN Cricinfo’s Terms of Use. Here are the main findings from the analysis above Analysis of Top 4 batsman The analysis of the Top 4 test batsman Tendulkar, Kallis, Ponting and Sangakkara show the folliwing 1. Sangakkara has the highest average, followed by Tendulkar, Kallis and then Ponting. 2. Ponting has the highest strike rate followed by Tendulkar,Sangakkara and then Kallis 3. The predicted runs for a given Balls faced and Minutes at crease is highest for Ponting, followed by Tendulkar, Sangakkara and Kallis 4. The moving average for Tendulkar and Ponting shows a downward trend while Kallis and Sangakkara retired too soon 5. Tendulkar was out of form about the time of retirement while the rest were in-form. But this result has to be taken along with the moving average plot. Ponting was clearly on the way out. 6. The home and overseas performance indicate that Tendulkar is the clear leader. He has the highest number of matches played overseas and his performance has been consistent. He is followed by Ponting, Kallis and finally Sangakkara Analysis of Top 3 legs spinners The analysis of Anil Kumble, Shane Warne and M Muralitharan show the following 1. Muralitharan has the highest wickets and best economy rate followed by Warne and Kumble 2. Muralitharan has higher wickets frequency percentage between 3 to 8 wickets 3. Muralitharan has the best Economy Rate for wickets between 2 to 7 4. The moving average plot shows that the time was up for Kumble and Muralitharan but Warne had a few years ahead 5. The check for form status shows that Muralitharan and Kumble time was over while Warne still in great form 6. Kumble’s has more matches abroad than the other 2, yet Kumble averages of 3 wickets at home and 2 wickets overseas liek Warne . Murali has played few matches but has an average of 4 wickets at home and 3 wickets overseas. Final thoughts Here are my final thoughts Batting Among the 4 batsman Tendulkar, Kallis, Ponting and Sangakkara the clear leader is Tendulkar for the following reasons 1. Tendulkar has the highest test centuries and runs of all time.Tendulkar’s average is 2nd to Sangakkara, Tendulkar’s predicted runs for a given Balls faced and Minutes at Crease is 2nd and is behind Ponting. Also Tendulkar’s performance at home and overseas are consistent throughtout despite the fact that he has a highest number of overseas matches 2. Ponting takes the 2nd spot with the 2nd highest number of centuries, 1st in Strike Rate and 2nd in home and away performance. 3. The 3rd spot goes to Sangakkara, with the highest average, 3rd highest number of centuries, reasonable run frequency percentage in different run ranges. However he has a fewer number of matches overseas and his performance overseas is significantly lower than at home 4. Kallis has the 2nd highest number of centuries but his performance overseas and strike rate are behind others 5. Finally Kallis and Sangakkara had a few good years of batting still left in them (pity they retired!) while Tendulkar and Ponting’s time was up Bowling Muralitharan leads the way followed closely by Warne and finally Kumble. The reasons are 1. Muralitharan has the highest number of test wickets with the best Wickets percentage and the best Economy Rate. Murali on average gas taken 4 wickets at home and 3 wickets overseas 2. Warne follows Murali in the highest wickets taken, however Warne has less matches overseas than Murali and average 3 wickets home and 2 wickets overseas 3. Kumble has the 3rd highest wickets, with 3 wickets on an average at home and 2 wickets overseas. However Kumble has played more matches overseas than the other two. In that respect his performance is great. Also Kumble has played less matches at home otherwise his numbers would have looked even better. 4. Also while Kumble and Muralitharan’s career was on the decline , Warne was going great and had a couple of years ahead. You can download this analysis at Introducing cricketr Hope you have fun using the cricketr package as I had in developing it. Do take a look at my follow up post Taking cricketr for a spin – Part 1 The common alphabet of programming languages “All animals are equal, but some animals are more equal than other.” “Four legs good, two legs bad.” from Animal Farm by George Orwell Note: This post is largely intended for those who are embarking on their journey into the world of programming. The article below highlights a set of constructs that recur in many imperative, dynamic and object-oriented languages. While these constructs cannot be applied directly to functional programming languages like Lisp,Haskell or Clojure, it may help. To some extent the programming language domain has been intentionally oversimplified to show that languages are not as daunting as they seem. Clearly there are a lot more subtle and complex differences among languages. Hope you have fun programming! Introduction: Anybody who is about to venture into the deep waters of programming will be bewildered and awed by the almost limitless number of programming languages and the associated paradigms on which they are based on. It is easy to feel apprehensive of programming, when faced with this this array of languages, not to mention the seemingly quirky syntax of each language. Many opinions abound, about what is the best programming language. In my opinion each language is best suited for a particular class of problems and is usually clunky if used outside of this. As an aside here is an interesting link provided by reader AKS to Rosetta Code, which is stated to be a a programming chrestomathy (present solutions to the same task in as many different languages as possible, to demonstrate how languages are similar and different, and to aid a person with a grounding in one approach to a problem in learning another. Rosetta Code currently has 772 tasks, 165 draft tasks, and is aware of 582 languages) You are likely to hear “All programming languages are equal, but some languages are more equal than others” from seasoned programmers who have their own pet language. There may also be others who swear that “procedural languages good, object oriented languages bad” or maybe “object oriented languages good, aspect oriented languages bad”. Unity in diversity Regardless of the language this post discusses a thread that is common to all programming languages. In fact any programming language can be expressed as Lx = C + Sx Where Lx is any programming language ‘x’. All programming languages have a set of core, common constructs which I have denoted as ‘C’ and a set of Specialized constructs, unique to each language ‘x’ which I have denoted as Sx. I would like to look at these constructs that are common to most programming languages like C,C++,Perl, Python, Ruby, C#, R, Octave etc. In my opinion knowing these core, common constructs and a few of the more specialized constructs should allow you to get started off in the language of your choice. You can pick up the more unique constructs as you go along. Here are the common constructs (C mentioned above) that you must familiarize yourself with when embarking on a new language 1. Reading user input and printing to screen 2. Reading and writing from a file 3. Conditional statement if-then-else if-else 4. Loops – For, while, repeat, do while etc. Knowing these constructs and some of the basic concepts unique to each language for e.g. – Structure, Pointers in C, – Classes, inheritance in C++ – Subsetting in Octave, R – car, cdr in Lisp will enable you to get started off in your chosen language. I show the examples of these core constructs in many languages. Note the similarity between these constructs 1. C Read from and write to console scanf(x,”%d); printf(“The value of x is %d”, x); Read from and write to file fread(buffer, strlen(c)+1, 1, fp); fwrite(c, strlen(c) + 1, 1, fp); Conditional if(x > 5) { printf(“x is greater than 5”); } else if (x < 5) { printf(“x is less than 5”); } else{ printf(“x is equal to 5”); } Loops I will only consider for loops, though one could use while, repeat etC. for(i =0; i <100; i++) { money = money++) } 2. C++ Read from and write to console cin >> age; Cout << “The value is “ << value Read from and write to a file // open a file in read mode. ifstream infile; infile.open("afile.dat"); cout << "Reading from the file" << endl; infile >> data; ofstream outfile; outfile.open("afile.dat"); // write inputted data into the file. outfile << data << endl; Conditional same as C if(x > 5) { printf(“x is greater than 5”); }  else if (x < 5) { printf(“x is less than 5”); } else{ printf(“x is equal to 5”); } Loops for(i =0; i <100; i++) { money = money++)  } 2. C++ Read from and write to console cin >> age;  Cout << “The value is “ << value Read from and write to a file // open a file in read mode. ifstream infile;  infile.open("afile.dat");  cout << "Reading from the file" << endl;  infile >> data; ofstream outfile;  outfile.open("afile.dat");  // write inputted data into the file.  outfile << data << endl; Conditional same as C if(x > 5) { printf(“x is greater than 5”);  }  else if (x < 5) {  printf(“x is less than 5”);  }  else{ printf(“x is equal to 5”);  } Loops for(i =0; i <100; i++){  money = money++)  } 3. Java Reading from and writing to standard input Console c = System.console();  int val = c.readLine("Enter a value: ");  System.out.println("Value is "+ val); Reading and writing from file try {  in = new FileInputStream("input.txt");  out = new FileOutputStream("output.txt");  int c;  while ((c = in.read()) != -1) {  out.write(c); } } ... Conditional (same as C) if(x > 5) {  printf(“x is greater than 5”);   }  else if (x < 5) {  printf(“x is less than 5”);  }  else{ printf(“x is equal to 5”); } Loops (same as C) for(i =0; i <100; i++){  money = money++)  } 4. Perl Read from console #!/usr/bin/perl  $userinput =  ;
chomp ($userinput);  Write to console  print "User typed$userinput\n";
Reading and write to a file
open(IN,"infile") || die "cannot open input file";
open(OUT,"outfile") || die "cannot open output file";
while() {
print OUT $_;  # echo line read  }  close(IN);  close(OUT) Conditional if($a  ==  20 ){
# if condition is true then print the following
printf "a has a value which is 20\n";
}
elsif( $a == 30 ){  # if condition is true then print the following  printf "a has a value which is 30\n";  }else{  # if none of the above conditions is true  printf "a has a value which is$a\n";
}
Loops
for (my $i=0;$i <= 9; $i++) {  print "$i\n";
}

5. Lisp
The syntax for Lisp will be different from the others as it is a functional language. You need to familiarize yourself with these constructs to move ahead
Read and write to console
To read from standard input use
(let ((temp 0))
(print ‘(Enter temp))
(setf temp (read))
(print (append ‘(the temp is) (list temp))))
Read from and write to file
(with-open-file (stream “C:\\acl82express\\lisp\\count.cl”)
(do ((line (read-line stream nil) (read-line stream nil)))
(with-open-file (stream “C:\\acl82express\\lisp\\test.txt” :direction :output :if-exists :supersede)
(write-line “test” stream) nil)
Conditional
$(cond ((< x 5)  (setf x (+ x 8))  (setf y (* 2 y)))  ((= x 10) (setf x (* x 2)))  (t (setf x 8))) Loops $  (setf x 5)
\$ (let ((i 0))
(loop (setf y (* x i))
(when (> i 10) (return))
(print i) (prin1 y) (incf i )))

6. Python
Reading and writing from console
var = raw_input("Please enter something: ")
print “You entered: ”  value
Reading and writing from files
f = open(filename, 'r')
a = f.readline().strip()
target = open(filename, 'w')
target.write(line1)
Conditionals
if x > 5:
print "x is greater than 5”
elif
x < 5:
print "x is less than 5"
else:
print "x is equal to 5"
Loops
for i in range(0, 6):
print "Value is :" % i  7.

R
x=5
paste('The value of x is =',x)
Reading and writing to a file
infile = read.csv(“file”)
write(x, file = "data", sep = " ")
Conditional
if(x > 5){
print(“x is greater than 5”)
}else if(x < 5){
print(“x is less than 5”)
}else {
print(“x is equal to 5”)
}
Loops
for (i in 1:10) print(i)

Conclusion
As can be seen the core constructs are very similar in different languages save for some minor variations. It is generally useful to get started with just knowing these constructs and few other important other features  of the language that you are trying to learn. It is possible to code most programs with these Core constructs and a few of the Specialized constructs in the language. These Core constructs are the glue that hold your code together.

You can learn more compact and more powerful features of the language as you go along The above core constructs are like the letters of the programming language alphabet. You need to construct words by stringing together these constructs and form sensible sentences which will be your program. Good luck with your adventure in your next new programming language!!!