cricketr and yorkr books – Paperback now in Amazon


My books
– Cricket Analytics with cricketr
– Beaten by sheer pace!: Cricket analytics with yorkr
are now available on Amazon in both Paperback and Kindle versions

The cricketr and yorkr packages are written in R, and both are available in CRAN. The books contain details on how to use these R packages to analyze performance of cricketers.

cricketr is based on data from ESPN Cricinfo Statsguru, and can analyze Test, ODI and T20 batsmen & bowlers. yorkr is based on data from Cricsheet, and can analyze ODI, T20 and IPL. yorkr can analyze batsmen, bowlers, matches and teams.

Cricket Analytics with cricketr
You can access the paperback at Cricket analytics with cricketr
untitled1

Beaten by sheer pace! Cricket Analytics with yorkr
You can buy the paperback from Amazon at Beaten by sheer pace: Cricket analytics with yorkr
untitled

Order your copy today! Hope you have a great time reading!

Inswinger: yorkr swings into International T20s


In this post I introduce ‘Inswinger’ an interactive Shiny app to analyze International T20 players, matches and teams. This app was a natural consequence to my earlier Shiny app ‘GooglyPlus’. Most of the structure for this app remained the same, I only had to work with a different dataset, so to speak.

The Googly Shiny app is based on my R package ‘yorkr’ which is now available in CRAN. The R package and hence this Shiny app is based on data from Cricsheet. Inswinger is based on the latest data dump from Cricsheet (Dec 2016) and includes all International T20 till then. There are a lot of new Internationation teams like Oman, Hong Kong, UAE, etc. In total there are 22 different International T20 teams in my Inswinger app.

The countries are a) Afghanistan b) Australia c) Bangladesh d) Bermuda e) Canada f) England g) Hong Kong h) India i) Ireland j) Kenya k) Nepal l) Netherlands m) New Zealand n) Oman o) Pakistan p) Papua New Guinea q) Scotland r) South Africa s) Sri Lanka t) United Arab Emirates u) West Indies v) Zimbabwe

My R package ‘yorkr’,  on which both these Shiny apps are based, has the ability to output either a dataframe or plot, depending on a parameter plot=TRUE or FALSE. Hence in the Inswinger Shiny app results can be displayed both as table or a plot depending on the choice of function.

Inswinger can do detailed analyses of a) Individual T20 batsman b) Individual T20 bowler c) Any T20 match d) Head to head confrontation between 2 T20 teams e) All matches of a T20 team against all other teams.

The Shiny app can be accessed at Inswinger

The code for Inswinger is available at Github. Feel free to clone/download/fork  the code from Inswinger

Based on the 5 detailed analysis domains there are 5 tabs
A) T20 Batsman: This tab can be used to perform analysis of all T20 batsman. If a batsman has played in more than 1 team, then the overall performance is considered. There are 10 functions for the T20 Batsman. They are shown below
– Batsman Runs vs. Deliveries
– Batsman’s Fours & Sixes
– Dismissals of batsman
– Batsman’s Runs vs Strike Rate
– Batsman’s Moving Average
– Batsman’s Cumulative Average Run
– Batsman’s Cumulative Strike Rate
– Batsman’s Runs against Opposition
– Batsman’s Runs at Venue
– Predict Runs of batsman

B) T20 Bowler: This tab can be used to analyze individual T20 bowlers. The functions handle T20 bowlers who have played in more than 1 T20 team.
– Mean Economy Rate of bowler
– Mean runs conceded by bowler
– Bowler’s Moving Average
– Bowler’s Cumulative Avg. Wickets
– Bowler’s Cumulative Avg. Economy Rate
– Bowler’s Wicket Plot
– Bowler’s Wickets against opposition
– Bowler’s Wickets at Venues
– Bowler’s wickets prediction

C) T20 match: This tab can be used for analyzing individual T20 matches. The available functions are
– Match Batting Scorecard – Table
– Batting Partnerships – Plot, Table
– Batsmen vs Bowlers – Plot, Table
– Match Bowling Scorecard   – Table
– Bowling Wicket Kind – Plot, Table
– Bowling Wicket Runs – Plot, Table
– Bowling Wicket Match – Plot, Table
– Bowler vs Batsmen – Plot, Table
– Match Worm Graph – Plot

D) Head to head: This tab can be used for analyzing head-to-head confrontations, between any 2 T20 teams for e.g. all matches between India vs Australia or West Indies vs Sri Lanka . The available functions are
-Team Batsmen Batting Partnerships All Matches – Plot, Table {Summary and Detailed}
-Team Batting Scorecard All Matches – Table
-Team Batsmen vs Bowlers all Matches – Plot, Table
-Team Wickets Opposition All Matches – Plot, Table
-Team Bowling Scorecard All Matches – Table
-Team Bowler vs Batsmen All Matches – Plot, Table
-Team Bowlers Wicket Kind All Matches – Plot, Table
-Team Bowler Wicket Runs All Matches – Plot, Table
– Win Loss All Matches – Plot

E) T20 team’s overall performance: this tab can be used analyze the overall performance of any T20 team. For this analysis all matches played by this team is considered. The available functions are
-Team Batsmen Partnerships Overall – Plot, Table {Summary and Detailed)}
-Team Batting Scorecard Overall –Table
-Team Batsmen vs Bowlers Overall – Plot, Table
-Team Bowler vs Batsmen Overall – Plot, Table
-Team Bowling Scorecard Overall – Table
-Team Bowler Wicket Kind Overall – Plot, Table

Below I include a random set of charts that are generated in each of the 5 tabs
A. IPL Batsman
a. Shakib-al-Hassan (Bangladesh) :  Runs vs Deliveries
untitled

b. Virat Kohli (India) – Cumulative Average
untitled

c.  AB Devilliers (South Africa) – Runs at venues
untitled

d. Glenn Maxwell (Australia)  – Predict runs vs deliveries faces
untitled

B. IPL Bowler
a. TG Southee (New Zealand) – Mean Economy Rate vs overs
untitled

b) DJ Bravo – Moving Average of wickets
untitled

c) AC Evans (Scotland) – Bowler Wickets Against Opposition
untitled

C.T20 Match
a. Match Score (Afghanistan vs Canada, 2012-03-18)
untitled

b)  Match batting partnerships (Plot) Hong Kong vs Oman (2015-11-21), Hong Kong
Hong Kong Partnerships
untitled

c) Match batting partnerships (Table) – Ireland vs Scotland(2012-03-18, Ireland)
Batting partnership can also be displayed as a table
untitled

d) Batsmen vs Bowlers (Plot) – India vs England (2012-12-22)
untitled

e) Match Worm Chart – Sri Lanka vs Pakistan (2015-08-01)
untitled

D.Head to head
a) Team Batsmen Partnership (Plot) – India vs Australia (all matches)
Virat Kohli has the highest total runs in partnerships against Australia
untitled

b)  Team Batsmen Partnership (Summary – Table) – Kenya vs Bangladesh
untitled

c) Team Bowling Scorecard (Table only) India vs South Africa all Matches
untitled

d) Wins- Losses New Zealand vs West Indies all Matches
untitled

C) Overall performances
a) Batting Scorecard All Matches  (Table only) – England’s overall batting performance
Eoin Morgan, Kevin Pieterson  & SJ Taylor have the best performance
untitled

b) Batsman vs Bowlers all Matches (Plot)
India’s best performing batsman (Rank=1) is Virat Kohli
untitled

c)  Batsman vs Bowlers all Matches (Table)
The plot above for Virat Kohli can also be displayed as a table. Kohli has score most runs DJ Bravo, SR Watson & Shahid Afridi
untitled

The Inswinger Shiny app can be accessed at Inswinger. Give it a swing!

The code for Inswinger is available at Github. Feel free to clone/download/fork  the code from Inswinger

Also see my other Shiny apps
1.GooglyPlus
2.What would Shakespeare say?
3.Sixer
4.Revisiting crimes against women in India

You may also like
1. Neural Networks: The mechanics of backpropagation
A primer on Qubits, Quantum gates and Quantum Operation
2. Re-working the Lucy Richardson algorithm in OpenCV
3.Design Principles of Scalable, Distributed Systems
4.Spicing up a IBM Bluemix cloud app with MongoDB and NodeExpress
5.Programming languages in layman’s language
7.Re-introducing cricketr! : An R package to analyze performances of cricketers

To see all posts take at a look at Index of Posts

GooglyPlus: yorkr analyzes IPL players, teams, matches with plots and tables


In this post I introduce my new Shiny app,“GooglyPlus”, which is a  more evolved version of my earlier Shiny app “Googly”. My R package ‘yorkr’,  on which both these Shiny apps are based, has the ability to output either a dataframe or plot, depending on a parameter plot=TRUE or FALSE. My initial version of the app only included plots, and did not exercise the yorkr package fully. Moreover, I am certain, there may be a set of cricket aficionados who would prefer, numbers to charts. Hence I have created this enhanced version of the Googly app and appropriately renamed it as GooglyPlus. GooglyPlus is based on the yorkr package which uses data from Cricsheet. The app is based on IPL data from  all IPL matches from 2008 up to 2016. Feel free to clone/fork or download the code from Github at GooglyPlus.

Click  GooglyPlus to access the Shiny app!

The changes for GooglyPlus over the earlier Googly app is only in the following 3 tab panels

  • IPL match
  • Head to head
  • Overall Performance

The analysis of IPL batsman and IPL bowler tabs are unchanged. These charts are as they were before.

The changes are only in  tabs i) IPL match ii) Head to head and  iii) Overall Performance. New functionality has been added and existing functions now have the dual option of either displaying a plot or a table.

The changes are

A) IPL Match
The following additions/enhancements have been done

-Match Batting Scorecard – Table
-Batting Partnerships – Plot, Table (New)
-Batsmen vs Bowlers – Plot, Table(New)
-Match Bowling Scorecard   – Table (New)
-Bowling Wicket Kind – Plot, Table (New)
-Bowling Wicket Runs – Plot, Table (New)
-Bowling Wicket Match – Plot, Table (New)
-Bowler vs Batsmen – Plot, Table (New)
-Match Worm Graph – Plot

B) Head to head
The following functions have been added/enhanced

-Team Batsmen Batting Partnerships All Matches – Plot, Table {Summary (New) and Detailed (New)}
-Team Batting Scorecard All Matches – Table (New)
-Team Batsmen vs Bowlers all Matches – Plot, Table (New)
-Team Wickets Opposition All Matches – Plot, Table (New)
-Team Bowling Scorecard All Matches – Table (New)
-Team Bowler vs Batsmen All Matches – Plot, Table (New)
-Team Bowlers Wicket Kind All Matches – Plot, Table (New)
-Team Bowler Wicket Runs All Matches – Plot, Table (New)
-Win Loss All Matches – Plot

C) Overall Performance
The following additions/enhancements have been done in this tab

-Team Batsmen Partnerships Overall – Plot, Table {Summary (New) and Detailed (New)}
-Team Batting Scorecard Overall –Table (New)
-Team Batsmen vs Bowlers Overall – Plot, Table (New)
-Team Bowler vs Batsmen Overall – Plot, Table (New)
-Team Bowling Scorecard Overall – Table (New)
-Team Bowler Wicket Kind Overall – Plot, Table (New)

Included below are some random charts and tables. Feel free to explore the Shiny app further

1) IPL Match
a) Match Batting Scorecard (Table only)
This is the batting score card for the Chennai Super Kings & Deccan Chargers 2011-05-11

untitled

b)  Match batting partnerships (Plot)
Delhi Daredevils vs Kings XI Punjab – 2011-04-23

untitled

c) Match batting partnerships (Table)
The same batting partnership  Delhi Daredevils vs Kings XI Punjab – 2011-04-23 as a table

untitled

d) Batsmen vs Bowlers (Plot)
Kolkata Knight Riders vs Mumbai Indians 2010-04-19

Untitled.png

e)  Match Bowling Scorecard (Table only)
untitled

B) Head to head

a) Team Batsmen Partnership (Plot)
Deccan Chargers vs Kolkata Knight Riders all matches

untitled

b)  Team Batsmen Partnership (Summary – Table)
In the following tables it can be seen that MS Dhoni has performed better that SK Raina  CSK against DD matches, whereas SK Raina performs better than Dhoni in CSK vs  KKR matches

i) Chennai Super Kings vs Delhi Daredevils (Summary – Table)

untitled

ii) Chennai Super Kings vs Kolkata Knight Riders (Summary – Table)
untitled

iii) Rising Pune Supergiants vs Gujarat Lions (Detailed – Table)
This table provides the detailed partnership for RPS vs GL all matches

untitled

c) Team Bowling Scorecard (Table only)
This table gives the bowling scorecard of Pune Warriors vs Deccan Chargers in all matches

untitled

C) Overall performances
a) Batting Scorecard All Matches  (Table only)

This is the batting scorecard of Royal Challengers Bangalore. The top 3 batsmen are V Kohli, C Gayle and AB Devilliers in that order

untitled

b) Batsman vs Bowlers all Matches (Plot)
This gives the performance of Mumbai Indian’s batsman of Rank=1, which is Rohit Sharma, against bowlers of all other teams

untitled

c)  Batsman vs Bowlers all Matches (Table)
The above plot as a table. It can be seen that Rohit Sharma has scored maximum runs against M Morkel, then Shakib Al Hasan and then UT Yadav.

untitled

d) Bowling scorecard (Table only)
The table below gives the bowling scorecard of CSK. R Ashwin leads with a tally of 98 wickets followed by DJ Bravo who has 88 wickets and then JA Morkel who has 83 wickets in all matches against all teams

Untitled.png

This is just a random selection of functions. Do play around with the app and checkout how the different IPL batsmen, bowlers and teams stack against each other. Do read my earlier post Googly: An interactive app for analyzing IPL players, matches and teams using R package yorkr  for more details about the app and other functions available.

Click GooglyPlus to access the Shiny app!

You can clone/fork/download the code from Github at GooglyPlus

Hope you have fun playing around with the Shiny app!

Note: In the tabs, for some of the functions, not all controls  are required. It is possible to enable the controls selectively but this has not been done in this current version. I may make the changes some time in the future.

Take a look at my other Shiny apps
a.Revisiting crimes against women in India
b. Natural language processing: What would Shakespeare say?

Check out some of my other posts
1. Analyzing World Bank data with WDI, googleVis Motion Charts
2. Video presentation on Machine Learning, Data Science, NLP and Big Data – Part 1
3. Singularity
4. Design principles of scalable, distributed systems
5. Simulating an Edge shape in Android
6. Dabbling with Wiener filter in OpenCV

To see all posts click Index of Posts

Googly: An interactive app for analyzing IPL players, matches and teams using R package yorkr


Presenting ‘Googly’, a cool Shiny app that I developed over the last couple of days. This interactive Shiny app was on my mind for quite some time, and I finally got down to implementing it. The Googly Shiny app is based on my R package ‘yorkr’ which is now available in CRAN. The R package and hence this Shiny app is based on data from Cricsheet.

Googly is based on R package yorkr, and uses the data of all IPL matches from 2008 up to 2016, available on Cricsheet.

Googly can do detailed analyses of a) Individual IPL batsman b) Individual IPL bowler c) Any IPL match d) Head to head confrontation between 2 IPL teams e) All matches of an IPL team against all other teams.

With respect to the individual IPL batsman and bowler performance, I was in a bit of a ‘bind’ literally (pun unintended), as any IPL player could have played in more than 1 IPL team. Fortunately ‘rbind’ came to my rescue. I just get all the batsman’s/bowler’s performance in each IPL team, and then consolidate it into a single large dataframe to do the analyses of.

The Shiny app can be accessed at Googly

The code for Googly is available at Github. Feel free to clone/download/fork  the code from Googly

Also see my post GooglyPlus: yorkr analyzes IPL players, teams, matches with plots and tables

Based on the 5 detailed analysis domains there are 5 tabs

IPL Batsman: This tab can be used to perform analysis of all IPL batsman. If a batsman has played in more than 1 team, then the overall performance is considered. There are 10 functions for the IPL Batsman. They are shown below

  1. Batsman Runs vs. Deliveries
  2. Batsman’s Fours & Sixes
  3. Dismissals of batsman
  4. Batsman’s Runs vs Strike Rate
  5. Batsman’s Moving Average
  6. Batsman’s Cumulative Average Run
  7. Batsman’s Cumulative Strike Rate
  8. Batsman’s Runs against Opposition
  9. Batsman’s Runs at Venue
  10. Predict Runs of batsman

IPL Bowler: This tab can be used to analyze individual IPL bowlers. The functions handle IPL bowlers who have played in more than 1 IPL team.

  1. Mean Economy Rate of bowler
  2. Mean runs conceded by bowler
  3. Bowler’s Moving Average
  4. Bowler’s Cumulative Avg. Wickets
  5. Bowler’s Cumulative Avg. Economy Rate
  6. Bowler’s Wicket Plot
  7. Bowler’s Wickets against opposition
  8. Bowler’s Wickets at Venues
  9. Bowler’s wickets prediction

IPL match: This tab can be used for analyzing individual IPL matches. The available functions are

  1. Batting Partnerships
  2. Batsmen vs Bowlers
  3. Bowling Wicket Kind
  4. Bowling Wicket Runs
  5. Bowling Wicket Match
  6. Bowler vs Batsmen
  7. Match Worm Graph

Head to head : This tab can be used for analyzing head-to-head confrontations, between any 2 IPL teams for e.g. all matches between Chennai Super Kings vs. Deccan Chargers or Kolkata Knight Riders vs. Delhi Daredevils. The available functions are

  1. Team Batsmen Batting Partnerships All Matches
  2. Team Batsmen vs Bowlers all Matches
  3. Team Wickets Opposition All Matches
  4. Team Bowler vs Batsmen All Matches
  5. Team Bowlers Wicket Kind All Matches
  6. Team Bowler Wicket Runs All Matches
  7. Win Loss All Matches

Overall performance : this tab can be used analyze the overall performance of any IPL team. For this analysis all matches played by this team is considered. The available functions are

  1. Team Batsmen Partnerships Overall
  2. Team Batsmen vs Bowlers Overall
  3. Team Bowler vs Batsmen Overall
  4. Team Bowler Wicket Kind Overall

Below I include a random set of charts that are generated in each of the 5 tabs

A. IPL Batsman
a. A Symonds : Runs vs Deliveries
untitled

b. AB Devilliers – Cumulative Strike Rate
untitled

c.  Gautam Gambhir – Runs at venues
untitled

d. CH Gayle – Predict runs 
untitled

B. IPL Bowler
a. Ashish Nehra – Cumulative Average Wickets
untitled

b.  DJ Bravo – Moving Average of wickets
untitled

c. R Ashwin – Mean Economy rate vs Overs
untitled

C.IPL Match
a. Chennai Super Kings vs Deccan Chargers   (2008 -05-06) – Batsmen Partnerships

Note: You can choose either team in the match from the drop down ‘Choose team’

untitled

b. Kolkata Knight Riders vs Delhi Daredevils (2013-04-02) – Bowling wicket runs
untitled

c. Mumbai Indians vs Kings XI Punjab (2010-03-30) – Match worm graph
untitled

D. Head to head confrontation
a. Rising Pune Supergiants vs Mumbai Indians in all matches – Team batsmen partnerships

Note: You can choose the partnership of either team in the drop down ‘Choose team’
untitled

b.  Gujarat Lions – Royal Challengers Bangalore all matches – Bowlers performance against batsmen
untitled

E. Overall Performance
a.  Royal Challengers Bangalore overall performance – Batsman Partnership (Rank=1)
This is Virat Kohli for RCB. Try out other ranks
untitled

b.  Rajashthan Royals overall Performance – Bowler vs batsman (Rank =2)
This is Vinay Kumar.
untitled

The Shiny app Googly can be accessed at Googly. Feel free to clone/fork the code from Github at Googly

For details on my R package yorkr, please see my blog Giga thoughts. There are more than 15 posts detailing the functions and their usage.

Do bowl a Googly!!!

You may like my other Shiny apps

Also see my other posts

  1. Introducing QCSimulator: A 5-qubit quantum computing simulator in R
  2. Deblurring with OpenCV: Weiner filter reloaded
  3. Rock N’ Roll with Bluemix, Cloudant & NodeExpress
  4. Introducing cricket package yorkr: Part 1- Beaten by sheer pace!
  5. Fun simulation of a Chain in Android
  6. Beaten by sheer pace! Cricket analytics with yorkr in paperback and Kindle versions
  7. Introducing cricketr! : An R package to analyze performances of cricketers
  8. Cricket analytics with cricketr!!!

For more posts see Index of posts

yorkr ranks IPL Players post 2016 season


Here is a short post which ranks IPL batsmen and bowlers post the 2016 IPL season. These are based on match data from Cricsheet. I had already ranked IPL players in my post yorkr ranks IPL batsmen and bowlers, but that was mid IPL 2016 season. This post will be final ranking post 2016 season

This post has also been published in RPubs RankIPLPlayers2016. You can download this as a pdf file at RankIPLPlayers2016.pdf.

You can take a look at the code at rankIPLPlayers2016

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

rm(list=ls())
library(yorkr)
library(dplyr)
source('C:/software/cricket-package/cricsheet/ipl2016/final/R/rankIPLBatsmen.R', encoding = 'UTF-8')
source('C:/software/cricket-package/cricsheet/ipl2016/final/R/rankIPLBowlers.R', encoding = 'UTF-8')

Rank IPL batsmen post 2016

Chris Gayle, Shaun Marsh & David Warner are top 3 IPL batsmen. Gayle towers over everybody, with an 38.28 Mean Runs, and a Mean Strike Rate of 138.85. Virat Kohli comes in 4th, with 34.52 as his Average Runs per innings, and a Mean Strike Rate of 117.51

iplBatsmanRank <- rankIPLBatsmen()
as.data.frame(iplBatsmanRank[1:30,])
##             batsman matches meanRuns    meanSR
## 1          CH Gayle      92 38.28261 138.85120
## 2          SE Marsh      60 36.40000 118.97783
## 3         DA Warner     104 34.51923 124.88798
## 4           V Kohli     136 31.77941 117.51000
## 5         AM Rahane      89 31.46067 104.62989
## 6    AB de Villiers     109 29.93578 136.48945
## 7      SR Tendulkar      78 29.62821 108.58962
## 8         G Gambhir     133 28.94737 109.61263
## 9         RG Sharma     140 28.68571 117.79057
## 10         SK Raina     143 28.41259 121.55713
## 11        SR Watson      90 28.21111 125.80122
## 12         S Dhawan     110 28.09091 111.97282
## 13         R Dravid      79 27.87342 109.14544
## 14         DR Smith      76 27.55263 120.22329
## 15        JP Duminy      70 27.28571 122.99243
## 16      BB McCullum      94 26.86170 118.55606
## 17        JH Kallis      97 26.83505  95.47866
## 18         V Sehwag     105 26.26667 137.11562
## 19       RV Uthappa     132 26.18182 123.16326
## 20     AC Gilchrist      81 25.77778 122.69074
## 21          M Vijay      99 25.69697 106.02010
## 22    KC Sangakkara      70 25.67143 112.97529
## 23         MS Dhoni     131 25.14504 131.62206
## 24        DA Miller      60 24.76667 133.80983
## 25        AT Rayudu      99 23.35354 121.59313
## 26 DPMD Jayawardene      80 23.05000 114.54712
## 27     Yuvraj Singh     103 22.46602 118.15000
## 28        DJ Hussey      63 22.26984        NA
## 29        YK Pathan     121 22.25620 132.58793
## 30      S Badrinath      66 22.22727 114.97061

Rank IPL bowlers

The top 3 IPL T20 bowlers are SL Malinga, DJ Bravo and SP Narine

Don’t get hung up on the decimals in the average wickets for the bowlers. All it implies is that if 2 bowlers have average wickets of 1.0 and 1.5, it implies that in 2 matches the 1st bowler will take 2 wickets and the 2nd bowler will take 3 wickets.

setwd("C:/software/cricket-package/cricsheet/ipl2016/details")
iplBowlersRank <- rankIPLBowlers()
as.data.frame(iplBowlersRank[1:30,])
##             bowler matches meanWickets   meanER
## 1       SL Malinga      96    1.645833 6.545208
## 2         DJ Bravo      58    1.517241 7.929310
## 3        SP Narine      65    1.492308 6.155077
## 4          B Kumar      45    1.422222 7.355556
## 5        YS Chahal      41    1.414634 8.057073
## 6         M Morkel      37    1.405405 7.626216
## 7        IK Pathan      40    1.400000 7.579250
## 8         RP Singh      42    1.357143 7.966429
## 9         MM Patel      31    1.354839 7.282581
## 10   R Vinay Kumar      63    1.317460 8.342540
## 11  Sandeep Sharma      38    1.315789 7.697368
## 12       MM Sharma      46    1.304348 7.740652
## 13         P Awana      33    1.303030 8.325758
## 14        MM Patel      30    1.300000 7.569667
## 15          Z Khan      41    1.292683 7.735854
## 16         PP Ojha      53    1.245283 7.268679
## 17     JP Faulkner      40    1.225000 8.502250
## 18 Shakib Al Hasan      41    1.170732 7.103659
## 19     DS Kulkarni      32    1.156250 8.372188
## 20        UT Yadav      46    1.152174 8.394783
## 21        A Kumble      41    1.146341 6.567073
## 22       JA Morkel      73    1.136986 8.131370
## 23        SK Warne      53    1.132075 7.277170
## 24        A Mishra      55    1.127273 7.319455
## 25        UT Yadav      33    1.090909 8.853636
## 26        L Balaji      34    1.088235 7.186176
## 27       PP Chawla      35    1.085714 8.162000
## 28        R Ashwin      92    1.065217 6.812391
## 29  M Muralitharan      39    1.051282 6.470256
## 30 Harbhajan Singh     120    1.050000 7.134833

Analyzing World Bank data with WDI, googleVis Motion Charts


Recently I was surfing the web, when I came across a real cool post New R package to access World Bank data, by Markus Gesmann on using googleVis and motion charts with World Bank Data. The post also introduced me to Hans Rosling, Professor of Sweden’s Karolinska Institute. Hans Rosling, the creator of the famous Gapminder chart, the “Heath and Wealth of Nations” displays global trends through animated charts (A must see!!!). As they say, in Hans Rosling’s hands, data dances and sings. Take a look at some of his Ted talks for e.g. Hans Rosling:New insights on poverty. Prof Rosling developed the breakthrough software behind the visualizations, in the Gapminder. The free software, which can be loaded with any data – was purchased by Google in March 2007.

In this post, I recreate some of the Gapminder charts with the help of R packages WDI and googleVis. The WDI  package of  Vincent Arel-Bundock, provides a set of really useful functions to get to data based on the World Bank Data indicators.  googleVis provides motion charts with which you can animate the data.. Incidentally Datacamp has a very nice, short course on googleVis “Having fun with googleVis

You can clone/download the code from Github at worldBankAnalysis which is in the form of an Rmd file.

library(WDI)
library(ggplot2)
library(googleVis)
library(plyr)

1.Get the data from 1960 to 2016 for the following

  1. Population – SP.POP.TOTL
  2. GDP in US $ – NY.GDP.MKTP.CD
  3. Life Expectancy at birth (Years) – SP.DYN.LE00.IN
  4. GDP Per capita income – NY.GDP.PCAP.PP.CD
  5. Fertility rate (Births per woman) – SP.DYN.TFRT.IN
  6. Poverty headcount ratio – SI.POV.2DAY
# World population total
population = WDI(indicator='SP.POP.TOTL', country="all",start=1960, end=2016)
# GDP in US $
gdp= WDI(indicator='NY.GDP.MKTP.CD', country="all",start=1960, end=2016)
# Life expectancy at birth (Years)
lifeExpectancy= WDI(indicator='SP.DYN.LE00.IN', country="all",start=1960, end=2016)
# GDP Per capita
income = WDI(indicator='NY.GDP.PCAP.PP.CD', country="all",start=1960, end=2016)
# Fertility rate (births per woman)
fertility = WDI(indicator='SP.DYN.TFRT.IN', country="all",start=1960, end=2016)
# Poverty head count
poverty= WDI(indicator='SI.POV.2DAY', country="all",start=1960, end=2016)

2.Rename the columns

names(population)[3]="Total population"
names(lifeExpectancy)[3]="Life Expectancy (Years)"
names(gdp)[3]="GDP (US$)"
names(income)[3]="GDP per capita income"
names(fertility)[3]="Fertility (Births per woman)"
names(poverty)[3]="Poverty headcount ratio"

3.Join the data frames

Join the individual data frames to one large wide data frame with all the indicators for the countries


j1 <- join(population, gdp)
j2 <- join(j1,lifeExpectancy)
j3 <- join(j2,income)
j4 <- join(j3,poverty)
wbData <- join(j4,fertility)

4.Use WDI_data

Use WDI_data to get the list of indicators and the countries. Join the countries and region

#This returns  list of 2 matrixes
wdi_data =WDI_data
# The 1st matrix is the list is the set of all World Bank Indicators
indicators=wdi_data[[1]]
# The 2nd  matrix gives the set of countries and regions
countries=wdi_data[[2]]
df = as.data.frame(countries)
aa <- df$region != "Aggregates"
# Remove the aggregates
countries_df <- df[aa,]
# Subset from the development data only those corresponding to the countries
bb = subset(wbData, country %in% countries_df$country)
cc = join(bb,countries_df)
dd = complete.cases(cc)
developmentDF = cc[dd,]

5.Create and display the motion chart

gg<- gvisMotionChart(cc,
                                idvar = "country",
                                timevar = "year",
                                xvar = "GDP",
                                yvar = "Life Expectancy",
                                sizevar ="Population",
                                colorvar = "region")
plot(gg)
cat(gg$html$chart, file="chart1.html")

Note: Unfortunately it is not possible to embed the motion chart in WordPress. It is has to hosted on a server as a Webpage. After exploring several possibilities I came up with the following process to display the animation graph. The plot is saved as a html file using ‘cat’ as shown above. The chart1.html page is then hosted as a Github page (gh-page) on Github.

Here is the ggvisMotionChart

Do give  World Bank Motion Chart1  a spin.  Here is how the Motion Chart has to be used

untitled

You can select Life Expectancy, Population, Fertility etc by clicking the black arrows. The blue arrow shows the ‘play’ button to set animate the motion chart. You can also select the countries and change the size of the circles. Do give it a try. Here are some quick analysis by playing around with the motion charts with different parameters chosen

The set of charts below are screenshots captured by running the motion chart World Bank Motion Chart1

a. Life Expectancy vs Fertility chart

This chart is used by Hans Rosling in his Ted talk. The left chart shows low life expectancy and high fertility rate for several sub Saharan and East Asia Pacific countries in the early 1960’s. Today the fertility has dropped and the life expectancy has increased overall. However the sub Saharan countries still have a high fertility rate

pic1

b. Population vs GDP

The chart below shows that GDP of India and China have the same GDP from 1973-1994 with US and Japan well ahead.

pic2

From 1998- 2014 China really pulls away from India and Japan as seen below

pic3

c. Per capita income vs Life Expectancy

In the 1990’s the per capita income and life expectancy of the sub -saharan countries are low (42-50). Japan and US have a good life expectancy in 1990’s. In 2014 the per capita income of the sub-saharan countries are still low though the life expectancy has marginally improved.

pic4

d. Population vs Poverty headcount

pic5

In the early 1990’s China had a higher poverty head count ratio than India. By 2004 China had this all figured out and the poverty head count ratio drops significantly. This can also be seen in the chart below.

pop_pov3

In the chart above China shows a drastic reduction in poverty headcount ratio vs India. Strangely Zambia shows an increase in the poverty head count ratio.

6.Get the data for the 2nd set of indicators

  1. Total population  – SP.POP.TOTL
  2. GDP in US$ – NY.GDP.MKTP.CD
  3. Access to electricity (% population) – EG.ELC.ACCS.ZS
  4. Electricity consumption KWh per capita -EG.USE.ELEC.KH.PC
  5. CO2 emissions -EN.ATM.CO2E.KT
  6. Sanitation Access – SH.STA.ACSN
# World population
population = WDI(indicator='SP.POP.TOTL', country="all",start=1960, end=2016)
# GDP in US $
gdp= WDI(indicator='NY.GDP.MKTP.CD', country="all",start=1960, end=2016)
# Access to electricity (% population)
elecAccess= WDI(indicator='EG.ELC.ACCS.ZS', country="all",start=1960, end=2016)
# Electric power consumption Kwh per capita
elecConsumption= WDI(indicator='EG.USE.ELEC.KH.PC', country="all",start=1960, end=2016)
#CO2 emissions
co2Emissions= WDI(indicator='EN.ATM.CO2E.KT', country="all",start=1960, end=2016)
# Access to sanitation (% population)
sanitationAccess= WDI(indicator='SH.STA.ACSN', country="all",start=1960, end=2016)

7.Rename the columns

names(population)[3]="Total population"
names(gdp)[3]="GDP US($)"
names(elecAccess)[3]="Access to Electricity (% popn)"
names(elecConsumption)[3]="Electric power consumption (KWH per capita)"
names(co2Emissions)[3]="CO2 emisions"
names(sanitationAccess)[3]="Access to sanitation(% popn)"

8.Join the individual data frames

Join the individual data frames to one large wide data frame with all the indicators for the countries


j1 <- join(population, gdp)
j2 <- join(j1,elecAccess)
j3 <- join(j2,elecConsumption)
j4 <- join(j3,co2Emissions)
wbData1 <- join(j3,sanitationAccess)

9.Use WDI_data

Use WDI_data to get the list of indicators and the countries. Join the countries and region

#This returns  list of 2 matrixes
wdi_data =WDI_data
# The 1st matrix is the list is the set of all World Bank Indicators
indicators=wdi_data[[1]]
# The 2nd  matrix gives the set of countries and regions
countries=wdi_data[[2]]
df = as.data.frame(countries)
aa <- df$region != "Aggregates"
# Remove the aggregates
countries_df <- df[aa,]
# Subset from the development data only those corresponding to the countries
ee = subset(wbData1, country %in% countries_df$country)
ff = join(ee,countries_df)
## Joining by: iso2c, country

10.Create and display the motion chart

gg1<- gvisMotionChart(ff,
                                idvar = "country",
                                timevar = "year",
                                xvar = "GDP",
                                yvar = "Access to Electricity",
                                sizevar ="Population",
                                colorvar = "region")
plot(gg1)
cat(gg1$html$chart, file="chart2.html")

This is World Bank Motion Chart2  which has a different set of parameters like Access to Energy, CO2 emissions etc

The set of charts below are screenshots of the motion chart World Bank Motion Chart 2

a. Access to Electricity vs Population
pic6The above chart shows that in China 100% population have access to electricity. India has made decent progress from 50% in 1990 to 79% in 2012. However Pakistan seems to have been much better in providing access to electricity. Pakistan moved from 59% to close 98% access to electricity

b. Power consumption vs population

powercon

The above chart shows the Power consumption vs Population. China and India have proportionally much lower consumption that Norway, US, Canada

c. CO2 emissions vs Population

pic7

In 1963 the CO2 emissions were fairly low and about comparable for all countries. US, India have shown a steady increase while China shows a steep increase. Interestingly UK shows a drop in CO2 emissions

d.  Access to sanitation
san

India shows an improvement but it has a long way to go with only 40% of population with access to sanitation. China has made much better strides with 80% having access to sanitation in 2015. Strangely Nigeria shows a drop in sanitation by almost about 20% of population.

The code is available at Github at worldBankAnalysys

Conclusion: So there you have it. I have shown some screenshots of some sample parameters of the World indicators. Please try to play around with World Bank Motion Chart1 & World Bank Motion Chart 2  with your own set of parameters and countries.  You can also create your own motion chart from the 100s of WDI indicators avaialable at  World Bank Data indicator.

Finally, I  would really like to thank Prof Hans Rosling, googleVis and  WDI (Vincent  Arel-Bundock) for making this visualization possible!

Also see
1.  Introducing QCSimulator: A 5-qubit quantum computing simulator in R
2. Dabbling with Wiener filter using OpenCV
3. Designing a Social Web Portal
4. Design Principles of Scalable, Distributed Systems
5. Re-introducing cricketr! : An R package to analyze performances of cricketers
6. Natural language processing: What would Shakespeare say?

To see all posts Index of posts

cricketr sizes up legendary All-rounders of yesteryear


Introduction

This is a post I have been wanting to write for several months, but had to put it off for one reason or another. In this post I use my R package cricketr to analyze the performance of All-rounder greats namely Kapil Dev, Ian Botham, Imran Khan and Richard Hadlee. All these players had talent that was natural and raw. They were good strikers of the ball and extremely lethal with their bowling. The ODI data for these players have been taken from ESPN Cricinfo.

Please be mindful of the ESPN Cricinfo Terms of Use

You can also read this post at Rpubs as cricketr-AR. Dowload this report as a PDF file from cricketr-AR

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. Just a familiarity with R and R Markdown only is needed.

All Rounders

  1. Kapil Dev (Ind)
  2. Ian Botham (Eng)
  3. Imran Khan (Pak)
  4. Richard Hadlee (NZ)

I have sprinkled the plots with a few of my comments. Feel free to draw your conclusions! The analysis is included below

if (!require("cricketr")){ 
    install.packages("cricketr",) 
} 

library(cricketr)

The data for any particular ODI player can be obtained with the getPlayerDataOD() function. To do you will need to go to ESPN CricInfo Playerand type in the name of the player for e.g Kapil Dev, etc. This will bring up a page which have the profile number for the player e.g. for Kapil Dev this would be http://www.espncricinfo.com/india/content/player/30028.html. Hence, Kapils’s profile is 30028. This can be used to get the data for Kapil Dev’s data as shown below. I have already executed the below 4 commands and I will use the files to run further commands

#kapil1 <- getPlayerDataOD(30028,dir="..",file="kapil1.csv",type="batting")
#botham11 <- getPlayerDataOD(9163,dir="..",file="botham1.csv",type="batting")
#imran1 <- getPlayerDataOD(40560,dir="..",file="imran1.csv",type="batting")
#hadlee1 <- getPlayerDataOD(37224,dir="..",file="hadlee1.csv",type="batting")

Analyses of batting performances of the All Rounders

The following plots gives the analysis of the 4 ODI batsmen

  1. Kapil Dev (Ind) – Innings – 225, Runs = 3783, Average=23.79, Strike Rate= 95.07
  2. Ian Botham (Eng) – Innings – 116, Runs= 2113, Average=23.21, Strike Rate= 79.10
  3. Imran Khan (Pak) – Innings – 175, Runs= 3709, Average=33.41, Strike Rate= 72.65
  4. Richard Hadlee (NZ) – Innings – 115, Runs= 1751, Average=21.61, Strike Rate= 75.50

Plot of 4s, 6s and the scoring rate in ODIs

The 3 charts below give the number of

  1. 4s vs Runs scored
  2. 6s vs Runs scored
  3. Balls faced vs Runs scored

A regression line is fitted in each of these plots for each of the ODI batsmen

A. Kapil Dev
It can be seen that Kapil scores four 4’s when he scores 50. Also after facing 50 deliveries he scores around 43

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./kapil1.csv","Kapil")
batsman6s("./kapil1.csv","Kapil")
batsmanScoringRateODTT("./kapil1.csv","Kapil")

kapil-4s6ssr-1

dev.off()
## null device 
##           1

B. Ian Botham
Botham scores around 39 runs after 50 deliveries

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./botham1.csv","Botham")
batsman6s("./botham1.csv","Botham")
batsmanScoringRateODTT("./botham1.csv","Botham")

botham-4s6sr-1

dev.off()
## null device 
##           1

C. Imran Khan
Imran scores around 36 runs for 50 deliveries

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./imran1.csv","Imran")
batsman6s("./imran1.csv","Imran")
batsmanScoringRateODTT("./imran1.csv","Imran")

imran-4s6ssr-1

dev.off()
## null device 
##           1

D. Richard Hadlee
Hadlee also scores around 30 runs facing 50 deliveries

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./hadlee1.csv","Hadlee")
batsman6s("./hadlee1.csv","Hadlee")
batsmanScoringRateODTT("./hadlee1.csv","Hadlee")

hadlee-4s6sout-1

dev.off()
## null device 
##           1

Cumulative Average runs of batsman in career

Kapils cumulative avrerage runs drops towards the last 15 innings wheres Botham had a good run towards the end of his career. Imran performance as a batsman really peaks towards the end with a cumulative average of almost 25 runs. Hadlee has a stead performance

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("./kapil1.csv","Kapil")

kbih-car-1

batsmanCumulativeAverageRuns("./botham1.csv","Botham")

kbih-car-2

batsmanCumulativeAverageRuns("./imran1.csv","Imran")

kbih-car-3

batsmanCumulativeAverageRuns("./hadlee1.csv","Hadlee")

kbih-car-4

dev.off()
## null device 
##           1

Cumulative Average strike rate of batsman in career

Kapil’s strike rate is superlative touching the 90’s steadily. Botham’s strike drops dramatically towards the latter part of his career. Imran average at a steady 75 and Hadlee averages around 85.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("./kapil1.csv","Kapil")

kbih-casr-1

batsmanCumulativeStrikeRate("./botham1.csv","Botham")

kbih-casr-2

batsmanCumulativeStrikeRate("./imran1.csv","Imran")

kbih-casr-3

batsmanCumulativeStrikeRate("./hadlee1.csv","Hadlee")

kbih-casr-4

dev.off()
## null device 
##           1

Relative Mean Strike Rate

Kapil tops the strike rate among all the all-rounders. This is really a revelation to me. This can also be seen in the original data in Kapil’s strike rate is at a whopping 95.07 in comparison to Botham, Inran and Hadlee who are at 79.1,72.65 and 75.50 respectively

par(mar=c(4,4,2,2))
frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeBatsmanSRODTT(frames,names)

plot-1-1

Relative Runs Frequency Percentage

This plot shows that Imran has a much better average runs scored over the other all rounders followed by Kapil

frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeRunsFreqPerfODTT(frames,names)

plot-2-1

Relative cumulative average runs in career

It can be seen clearly that Imran Khan leads the pack in cumulative average runs followed by Kapil Dev and then Botham

frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeBatsmanCumulativeAvgRuns(frames,names)

kbih-relcar-1

Relative cumulative average strike rate in career

In the cumulative strike rate Hadlee and Kapil run a close race.

frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeBatsmanCumulativeStrikeRate(frames,names)

kbih-relcsr-1

Percent 4’s,6’s in total runs scored

The plot below shows the contrib

frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
runs4s6s <-batsman4s6s(frames,names)

plot-46s-1

print(runs4s6s)
##                Kapil Botham Imran Hadlee
## Runs(1s,2s,3s) 72.08  66.53 77.53  73.27
## 4s             21.98  25.78 17.61  21.08
## 6s              5.94   7.68  4.86   5.65

Runs forecast

The forecast for the batsman is shown below.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./kapil1.csv","Kapil")
batsmanPerfForecast("./botham1.csv","Botham")
batsmanPerfForecast("./imran1.csv","Imran")
batsmanPerfForecast("./hadlee1.csv","Hadlee")

plot-fcst-1

dev.off()
## null device 
##           1

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./kapil1.csv","Kapil")
battingPerf3d("./botham1.csv","Botham")

plot-3-1

dev.off()
## null device 
##           1
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./imran1.csv","Imran")
battingPerf3d("./hadlee1.csv","Hadlee")

plot-4-1

dev.off()
## null device 
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

BF <- seq( 10, 200,length=10)
Mins <- seq(30,220,length=10)
newDF <- data.frame(BF,Mins)

kapil <- batsmanRunsPredict("./kapil1.csv","Kapil",newdataframe=newDF)
botham <- batsmanRunsPredict("./botham1.csv","Botham",newdataframe=newDF)
imran <- batsmanRunsPredict("./imran1.csv","Imran",newdataframe=newDF)
hadlee <- batsmanRunsPredict("./hadlee1.csv","Hadlee",newdataframe=newDF)

The fitted model is then used to predict the runs that the batsmen will score for a hypotheticial Balls faced and Minutes at crease. It can be seen that Kapil is the best bet for a balls faced and minutes at crease followed by Botham.

batsmen <-cbind(round(kapil$Runs),round(botham$Runs),round(imran$Runs),round(hadlee$Runs))
colnames(batsmen) <- c("Kapil","Botham","Imran","Hadlee")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Kapil Botham Imran Hadlee
## 1          10           30    16      6    10     15
## 2          31           51    33     22    22     28
## 3          52           72    49     38    33     42
## 4          73           93    65     54    45     56
## 5          94          114    81     70    56     70
## 6         116          136    97     86    67     84
## 7         137          157   113    102    79     97
## 8         158          178   130    117    90    111
## 9         179          199   146    133   102    125
## 10        200          220   162    149   113    139

Highest runs likelihood

The plots below the runs likelihood of batsman. This uses K-Means . A. Kapil Dev

batsmanRunsLikelihood("./kapil1.csv","Kapil")

kapil11-1

## Summary of  Kapil 's runs scoring likelihood
## **************************************************
## 
## There is a 34.57 % likelihood that Kapil  will make  22 Runs in  24 balls over 34  Minutes 
## There is a 17.28 % likelihood that Kapil  will make  46 Runs in  46 balls over  65  Minutes 
## There is a 48.15 % likelihood that Kapil  will make  5 Runs in  7 balls over 9  Minutes

B. Ian Botham

batsmanRunsLikelihood("./botham1.csv","Botham")

devilliers-1

## Summary of  Botham 's runs scoring likelihood
## **************************************************
## 
## There is a 47.95 % likelihood that Botham  will make  9 Runs in  12 balls over 15  Minutes 
## There is a 39.73 % likelihood that Botham  will make  23 Runs in  32 balls over  44  Minutes 
## There is a 12.33 % likelihood that Botham  will make  59 Runs in  74 balls over 101  Minutes

C. Imran Khan

batsmanRunsLikelihood("./imran1.csv","Imran")

gaylecache-true-1

## Summary of  Imran 's runs scoring likelihood
## **************************************************
## 
## There is a 23.33 % likelihood that Imran  will make  36 Runs in  54 balls over 74  Minutes 
## There is a 60 % likelihood that Imran  will make  14 Runs in  18 balls over  23  Minutes 
## There is a 16.67 % likelihood that Imran  will make  53 Runs in  90 balls over 115  Minutes

D. Richard Hadlee

batsmanRunsLikelihood("./hadlee1.csv","Hadlee")

maxwell-1

## Summary of  Hadlee 's runs scoring likelihood
## **************************************************
## 
## There is a 6.1 % likelihood that Hadlee  will make  64 Runs in  79 balls over 90  Minutes 
## There is a 42.68 % likelihood that Hadlee  will make  25 Runs in  33 balls over  44  Minutes 
## There is a 51.22 % likelihood that Hadlee  will make  9 Runs in  11 balls over 15  Minutes

Average runs at ground and against opposition

A. Kapil Dev

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./kapil1.csv","Kapil")
batsmanAvgRunsOpposition("./kapil1.csv","Kapil")

avgrg-1-1

dev.off()
## null device 
##           1

B. Ian Botham

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./botham1.csv","Botham")
batsmanAvgRunsOpposition("./botham1.csv","Botham")

avgrg-2-1

dev.off()
## null device 
##           1

C. Imran Khan

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./imran1.csv","Imran")
batsmanAvgRunsOpposition("./imran1.csv","Imran")

avgrg-3-1

dev.off()
## null device 
##           1

D. Richard Hadlee

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./hadlee1.csv","Hadlee")
batsmanAvgRunsOpposition("./hadlee1.csv","Hadlee")

avgrg-4-1

dev.off()
## null device 
##           1

Moving Average of runs over career

The moving average for the 4 batsmen indicate the following

Kapil’s performance drops significantly while there is a slump in Botham’s performance. On the other hand Imran and Hadlee’s performance were on the upswing.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./kapil1.csv","Kapil")
batsmanMovingAverage("./botham1.csv","Botham")
batsmanMovingAverage("./imran1.csv","Imran")
batsmanMovingAverage("./hadlee1.csv","Hadlee")

sdgm-ma-1

dev.off()
## null device 
##           1

Check batsmen in-form, out-of-form

[1] “**************************** Form status of Kapil ****************************\n\n
Population size: 72
Mean of population: 19.38 \n
Sample size: 9 Mean of sample: 6.78 SD of sample: 6.14 \n\n
Null hypothesis H0 : Kapil ‘s sample average is within 95% confidence interval of population average\n
Alternative hypothesis Ha : Kapil ‘s sample average is below the 95% confidence interval of population average\n\n
Kapil ‘s Form Status: Out-of-Form because the p value: 8.4e-05 is less than alpha= 0.05

“**************************** Form status of Botham ****************************\n\n
Population size: 65
Mean of population: 21.29 \n
Sample size: 8 Mean of sample: 15.38 SD of sample: 13.19 \n\n
Null hypothesis H0 : Botham ‘s sample average is within 95% confidence interval of population average\n
Alternative hypothesis Ha : Botham ‘s sample average is below the 95% confidence interval of population average\n\n
Botham ‘s Form Status: In-Form because the p value: 0.120342 is greater than alpha= 0.05 \n

“**************************** Form status of Imran ****************************\n\n
Population size: 54
Mean of population: 24.94 \n
Sample size: 6 Mean of sample: 30.83 SD of sample: 25.4 \n\n
Null hypothesis H0 : Imran ‘s sample average is within 95% confidence interval of population average\n
Alternative hypothesis Ha : Imran ‘s sample average is below the 95% confidence interval of population average\n\n
Imran ‘s Form Status: In-Form because the p value: 0.704683 is greater than alpha= 0.05 \n

“**************************** Form status of Hadlee ****************************\n\n
Population size: 73
Mean of population: 18 \n
Sample size: 9 Mean of sample: 27 SD of sample: 24.27 \n\n
Null hypothesis H0 : Hadlee ‘s sample average is within 95% confidence interval of population average\n
Alternative hypothesis Ha : Hadlee ‘s sample average is below the 95% confidence interval of population average\n\n
Hadlee ‘s Form Status: In-Form because the p value: 0.85262 is greater than alpha= 0.05 \n *******************************************************************************************\n\n”

Analyses of bowling performances of the All Rounders

The following plots gives the analysis of the 4 ODI batsmen

  1. Kapil Dev (Ind) – Innings – 225, Wickets = 253, Average=27.45, Economy Rate= 3.71
  2. Ian Botham (Eng) – Innings – 116, Wickets = 145, Average=28.54, Economy Rate= 3.96
  3. Imran Khan (Pak) – Innings – 175, Wickets = 182, Average=26.61, Economy Rate= 3.89
  4. Richard Hadlee (NZ) – Innings – 115, Wickets = 158, Average=21.56, Economy Rate= 3.30

Botham has the highest number of innings and wickets followed closely by Mitchell. Imran and Hadlee have relatively fewer innings.

To get the bowler’s data use

#kapil2 <- getPlayerDataOD(30028,dir="..",file="kapil2.csv",type="bowling")
#botham2 <- getPlayerDataOD(9163,dir="..",file="botham2.csv",type="bowling")
#imran2 <- getPlayerDataOD(40560,dir="..",file="imran2.csv",type="bowling")
#hadlee2 <- getPlayerDataOD(37224,dir="..",file="hadlee2.csv",type="bowling")

“`

Wicket Frequency percentage

This plot gives the percentage of wickets for each wickets (1,2,3…etc).

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./kapil2.csv","Kapil")
bowlerWktsFreqPercent("./botham2.csv","Botham")
bowlerWktsFreqPercent("./imran2.csv","Imran")
bowlerWktsFreqPercent("./hadlee2.csv","Hadlee")

relbowlfp-1

dev.off()
## null device 
##           1

Wickets Runs plot

The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers.

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))

bowlerWktsRunsPlot("./kapil2.csv","Kapil")
bowlerWktsRunsPlot("./botham2.csv","Botham")
bowlerWktsRunsPlot("./imran2.csv","Imran")
bowlerWktsRunsPlot("./hadlee2.csv","Hadlee")

wktsrun-1

dev.off()
## null device 
##           1

Cumulative average wicket plot

Botham has the best cumulative average wicket touching almost 1.6 wickets followed by Hadlee

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgWickets("./kapil2.csv","Kapil")

kwm-bowlcaw-1

bowlerCumulativeAvgWickets("./botham2.csv","Botham")

kwm-bowlcaw-2

bowlerCumulativeAvgWickets("./imran2.csv","Imran")

kwm-bowlcaw-3

bowlerCumulativeAvgWickets("./hadlee2.csv","Hadlee")

kwm-bowlcaw-4

dev.off()
## null device 
##           1
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgEconRate("./kapil2.csv","Kapil")

kwm-bowlcer-1

bowlerCumulativeAvgEconRate("./botham2.csv","Botham")

kwm-bowlcer-2

bowlerCumulativeAvgEconRate("./imran2.csv","Imran")

kwm-bowlcer-3

bowlerCumulativeAvgEconRate("./hadlee2.csv","Hadlee")

kwm-bowlcer-4

dev.off()
## null device 
##           1

Average wickets in different grounds and opposition

A. Kapil Dev

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./kapil2.csv","Kapil")
bowlerAvgWktsOpposition("./kapil2.csv","Kapil")

gr-1-1

dev.off()
## null device 
##           1

B. Ian Botham

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./botham2.csv","Botham")
bowlerAvgWktsOpposition("./botham2.csv","Botham")

gr-2-1

dev.off()
## null device 
##           1

C. Imran Khan

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./imran2.csv","Imran")
bowlerAvgWktsOpposition("./imran2.csv","Imran")

gr-3-1

dev.off()
## null device 
##           1

D. Richard Hadlee

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./hadlee2.csv","Hadlee")
bowlerAvgWktsOpposition("./hadlee2.csv","Hadlee")

gr-4-1

dev.off()
## null device 
##           1

Relative bowling performance

It can be seen that Botham is the most effective wicket taker of the lot

frames <- list("./kapil2.csv","./botham2.csv","imran2.csv","hadlee2.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeBowlingPerf(frames,names)

relbowlperf-1

Relative Economy Rate against wickets taken

Hadlee has the best overall economy rate followed by Kapil Dev

frames <- list("./kapil2.csv","./botham2.csv","imran2.csv","hadlee2.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeBowlingERODTT(frames,names)

relbowler-1

Relative cumulative average wickets of bowlers in career

This plot confirms the wicket taking ability of Botham followed by Hadlee

frames <- list("./kapil2.csv","./botham2.csv","imran2.csv","hadlee2.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeBowlerCumulativeAvgWickets(frames,names)

rbcaw-1

Relative cumulative average economy rate of bowlers

frames <- list("./kapil2.csv","./botham2.csv","imran2.csv","hadlee2.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeBowlerCumulativeAvgEconRate(frames,names)

rbcer-1

Moving average of wickets over career

This plot shows that Hadlee has the best economy rate followed by Kapil

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./kapil2.csv","Kapil")
bowlerMovingAverage("./botham2.csv","Botham")
bowlerMovingAverage("./imran2.csv","Imran")
bowlerMovingAverage("./hadlee2.csv","Hadlee")

jmss-bowlma-1

dev.off()
## null device 
##           1

Wickets forecast

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerPerfForecast("./kapil2.csv","Kapil")
bowlerPerfForecast("./botham2.csv","Botham")
bowlerPerfForecast("./imran2.csv","Imran")
bowlerPerfForecast("./hadlee2.csv","Hadlee")

jjmss-pfcst-1

dev.off()
## null device 
##           1

Check bowler in-form, out-of-form

“**************************** Form status of Kapil ****************************\n\n
Population size: 198
Mean of population: 1.2 \n Sample size: 23 Mean of sample: 0.65 SD of sample: 0.83 \n\n
Null hypothesis H0 : Kapil ‘s sample average is within 95% confidence interval \n of population average\n
Alternative hypothesis Ha : Kapil ‘s sample average is below the 95% confidence\n interval of population average\n\n
Kapil ‘s Form Status: Out-of-Form because the p value: 0.002097 is less than alpha= 0.05 \n

“**************************** Form status of Botham ****************************\n\n
Population size: 166
Mean of population: 1.58 \n Sample size: 19 Mean of sample: 1.47 SD of sample: 1.12 \n\n
Null hypothesis H0 : Botham ‘s sample average is within 95% confidence interval \n of population average\n
Alternative hypothesis Ha : Botham ‘s sample average is below the 95% confidence\n interval of population average\n\n
Botham ‘s Form Status: In-Form because the p value: 0.336694 is greater than alpha= 0.05 \n

“**************************** Form status of Imran ****************************\n\n
Population size: 137
Mean of population: 1.23 \n Sample size: 16 Mean of sample: 0.81 SD of sample: 0.91 \n\n
Null hypothesis H0 : Imran ‘s sample average is within 95% confidence interval \n of population average\n
Alternative hypothesis Ha : Imran ‘s sample average is below the 95% confidence\n interval of population average\n\n
Imran ‘s Form Status: Out-of-Form because the p value: 0.041727 is less than alpha= 0.05 \n

“**************************** Form status of Hadlee ****************************\n\n
Population size: 100
Mean of population: 1.38 \n Sample size: 12 Mean of sample: 1.67 SD of sample: 1.37 \n\n
Null hypothesis H0 : Hadlee ‘s sample average is within 95% confidence interval \n of population average\n
Alternative hypothesis Ha : Hadlee ‘s sample average is below the 95% confidence\n interval of population average\n\n
Hadlee ‘s Form Status: In-Form because the p value: 0.761265 is greater than alpha= 0.05 \n *******************************************************************************************\n\n”

Key findings

Here are some key conclusions ODI batsmen

  1. Kapil Dev’s strike rate stands high above the other 3
  2. Imran Khan has the best cumulative average runs followed by Kapil
  3. Botham is the most effective wicket taker followed by Hadlee
  4. Hadlee is the most economical bowler and is followed by Kapil Dev
  5. For a hypothetical Balls Faced and Minutes at creases Kapil will score the most runs followed by Botham
  6. The moving average of indicates that the best is yet to come for Imran and Hadlee. Kapil and Botham were on the decline

Also see my other posts in R

  1. A primer on Qubits, Quantum gates abd Quantum operations
  2. Deblurring with OpenCV:Weiner filter reloaded
  3. Designing a Social Web Portal
  4. A crime map of India in R – Crimes against women
  5. Bend it like Bluemix, MongoDB with autoscaling – Part 2
  6. Mirror, mirror . the best batsman of them all?

For a full list of posts see Index of posts

IBM Data Science Experience:  First steps with yorkr


Fresh, and slightly dizzy, from my foray into Quantum Computing with IBM’s Quantum Experience, I now turn my attention to IBM’s Data Science Experience (DSE).

I am on the verge of completing a really great 3 module ‘Data Science and Engineering with Spark XSeries’ from the University of California, Berkeley and I have been thinking of trying out some form of integrated delivery platform for performing analytics, for quite some time.  Coincidentally,  IBM comes out with its Data Science Experience. a month back. There are a couple of other collaborative platforms available for playing around with Apache Spark or Data Analytics namely Jupyter notebooks, Databricks, Data.world.

I decided to go ahead with IBM’s Data Science Experience as  the GUI is a lot cooler, includes shared data sets and integrates with Object Storage, Cloudant DB etc,  which seemed a lot closer to the cloud, literally!  IBM’s DSE is an interactive, collaborative, cloud-based environment for performing data analysis with Apache Spark. DSE is hosted on IBM’s PaaS environment, Bluemix. It should be possible to access in DSE the plethora of cloud services available on Bluemix. IBM’s DSE uses Jupyter notebooks for creating and analyzing data which can be easily shared and has access to a few hundred publicly available datasets

Disclaimer: This article represents the author’s viewpoint only and doesn’t necessarily represent IBM’s positions, strategies or opinions

In this post, I use IBM’s DSE and my R package yorkr, for analyzing the performance of 1 ODI match (Aus-Ind, 2 Feb 2012)  and the batting performance of Virat Kohli in IPL matches. These are my ‘first’ steps in DSE so, I use plain old “R language” for analysis together with my R package ‘yorkr’. I intend to  do more interesting stuff on Machine learning with SparkR, Sparklyr and PySpark in the weeks and months to come.

You can checkout the Jupyter notebooks created with IBM’s DSE Y at Github  – “Using R package yorkr – A quick overview’ and  on NBviewer at “Using R package yorkr – A quick overview

Working with Jupyter notebooks are fairly straight forward which can handle code in R, Python and Scala. Each cell can either contain code (Python or Scala), Markdown text, NBConvert or Heading. The code is written into the cells and can be executed sequentially. Here is a screen shot of the notebook.

Untitled

The ‘File’ menu can be used for ‘saving and checkpointing’ or ‘reverting’ to a checkpoint. The ‘kernel’ menu can be used to start, interrupt, restart and run all cells etc. Data Sources icon can be used to load data sources to your code. The data is uploaded to Object Storage with appropriate credentials. You will have to  import this data from Object Storage using the credentials. In my notebook with yorkr I directly load the data from Github.  You can use the sharing to share the notebook. The shared notebook has an extension ‘ipynb’. You can use the ‘Sharing’ icon  to share the notebook. The shared notebook has an extension ‘ipynb’. You an import this notebook directly into your environment and can get started with the code available in the notebook.

You can import existing R, Python or Scala notebooks as shown below. My notebook ‘Using R package yorkr – A quick overview’ can be downloaded using the link ‘yorkrWithDSE’ and clicking the green download icon on top right corner.

Untitled2

I have also uploaded the file to Github and you can download from here too ‘yorkrWithDSE’. This notebook can be imported into your DSE as shown below

Untitled1

Jupyter notebooks have been integrated with Github and are rendered directly from Github.  You can view my Jupyter notebook here  – “Using R package yorkr – A quick overview’. You can also view it on NBviewer at “Using R package yorkr – A quick overview

So there it is. You can download my notebook, import it into IBM’s Data Science Experience and then use data from ‘yorkrData” as shown. As already mentioned yorkrData contains converted data for ODIs, T20 and IPL. For details on how to use my R package yorkr  please my posts on yorkr at “Index of posts

Hope you have fun playing wit IBM’s Data Science Experience and my package yorkr.

I will be exploring IBM’s DSE in weeks and months to come in the areas of Machine Learning with SparkR,SparklyR or pySpark.

Watch this space!!!

Disclaimer: This article represents the author’s viewpoint only and doesn’t necessarily represent IBM’s positions, strategies or opinions

Also see

1. Introducing QCSimulator: A 5-qubit quantum computing simulator in R
2. Natural Processing Language : What would Shakespeare say?
3. Introducing cricket package yorkr:Part 1- Beaten by sheer pace!
4. A closer look at “Robot horse on a Trot! in Android”
5.  Re-introducing cricketr! : An R package to analyze performances of cricketers
6.   What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
7.  Deblurring with OpenCV: Wiener filter reloaded

To see all my posts check
Index of posts

Introducing QCSimulator: A 5-qubit quantum computing simulator in R


Introduction

My 5-qubit Quantum Computing Simulator,QCSimulator, is finally ready, and here it is! I have been able to successfully complete this simulator by working through a fair amount of material. To a large extent, the simulator is easy, if one understands how to solve the quantum circuit. However the theory behind quantum computing itself, is quite formidable, and I hope to scale this mountain over a period of time.

QCSimulator is now on CRAN!!!

The code for the QCSimulator package is also available at Github QCSimulator. This post has also been published at Rpubs as QCSimulator and can be downloaded as a PDF document at QCSimulator.pdf

Disclaimer: This article represents the author’s viewpoint only and doesn’t necessarily represent IBM’s positions, strategies or opinions

install.packages("QCSimulator")
library(QCSimulator)
library(ggplot2)

1. Initialize the environment and set global variables

Here I initialize the environment with global variables and then display a few of them.

rm(list=ls())
#Call the init function to initialize the environment and create global variables
init()

# Display some of global variables in environment
ls()
##  [1] "I16"     "I2"      "I4"      "I8"      "q0_"     "q00_"    "q000_"  
##  [8] "q0000_"  "q00000_" "q00001_" "q0001_"  "q00010_" "q00011_" "q001_"  
## [15] "q0010_"  "q00100_" "q00101_" "q0011_"  "q00110_" "q00111_" "q01_"   
## [22] "q010_"   "q0100_"  "q01000_" "q01001_" "q0101_"  "q01010_" "q01011_"
## [29] "q011_"   "q0110_"  "q01100_" "q01101_" "q0111_"  "q01111_" "q1_"    
## [36] "q10_"    "q100_"   "q1000_"  "q10000_" "q10001_" "q1001_"  "q10010_"
## [43] "q10011_" "q101_"   "q1010_"  "q10100_" "q10101_" "q1011_"  "q10110_"
## [50] "q10111_" "q11_"    "q110_"   "q1100_"  "q11000_" "q11001_" "q1101_" 
## [57] "q11010_" "q11011_" "q111_"   "q1110_"  "q11100_" "q11101_" "q1111_" 
## [64] "q11110_" "q11111_"
#1. 2 x 2 Identity matrix 
I2
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1
#2. 8 x 8 Identity matrix 
I8
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    1    0    0    0    0    0    0    0
## [2,]    0    1    0    0    0    0    0    0
## [3,]    0    0    1    0    0    0    0    0
## [4,]    0    0    0    1    0    0    0    0
## [5,]    0    0    0    0    1    0    0    0
## [6,]    0    0    0    0    0    1    0    0
## [7,]    0    0    0    0    0    0    1    0
## [8,]    0    0    0    0    0    0    0    1
#3. Qubit |00>
q00_
##      [,1]
## [1,]    1
## [2,]    0
## [3,]    0
## [4,]    0
#4. Qubit |010>
q010_
##      [,1]
## [1,]    0
## [2,]    0
## [3,]    1
## [4,]    0
## [5,]    0
## [6,]    0
## [7,]    0
## [8,]    0
#5. Qubit |0100>
q0100_
##       [,1]
##  [1,]    0
##  [2,]    0
##  [3,]    0
##  [4,]    0
##  [5,]    1
##  [6,]    0
##  [7,]    0
##  [8,]    0
##  [9,]    0
## [10,]    0
## [11,]    0
## [12,]    0
## [13,]    0
## [14,]    0
## [15,]    0
## [16,]    0
#6. Qubit 10010
q10010_
##       [,1]
##  [1,]    0
##  [2,]    0
##  [3,]    0
##  [4,]    0
##  [5,]    0
##  [6,]    0
##  [7,]    0
##  [8,]    0
##  [9,]    0
## [10,]    0
## [11,]    0
## [12,]    0
## [13,]    0
## [14,]    0
## [15,]    0
## [16,]    0
## [17,]    0
## [18,]    0
## [19,]    1
## [20,]    0
## [21,]    0
## [22,]    0
## [23,]    0
## [24,]    0
## [25,]    0
## [26,]    0
## [27,]    0
## [28,]    0
## [29,]    0
## [30,]    0
## [31,]    0
## [32,]    0

The QCSimulator implements the following gates

  1. Pauli X,Y,Z, S,S’, T, T’ gates
  2. Rotation , Hadamard,CSWAP,Toffoli gates
  3. 2,3,4,5 qubit CNOT gates e.g CNOT2_01,CNOT3_20,CNOT4_13 etc
  4. Toffoli State,SWAPQ0Q1

2. To display the unitary matrix of gates

To check the unitary matrix of gates, we need to pass the appropriate identity matrix as an argument. Hence below the qubit gates require a 2 x 2 unitary matrix and the 2 & 3 qubit CNOT gates require a 4 x 4 and 8 x 8 identity matrix respectively

PauliX(I2)
##      [,1] [,2]
## [1,]    0    1
## [2,]    1    0
Hadamard(I2)
##           [,1]       [,2]
## [1,] 0.7071068  0.7071068
## [2,] 0.7071068 -0.7071068
S1Gate(I2)
##      [,1] [,2]
## [1,] 1+0i 0+0i
## [2,] 0+0i 0-1i
CNOT2_10(I4)
##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    0    0    0    1
## [3,]    0    0    1    0
## [4,]    0    1    0    0
CNOT3_20(I8)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    1    0    0    0    0    0    0    0
## [2,]    0    0    0    0    0    1    0    0
## [3,]    0    0    1    0    0    0    0    0
## [4,]    0    0    0    0    0    0    0    1
## [5,]    0    0    0    0    1    0    0    0
## [6,]    0    1    0    0    0    0    0    0
## [7,]    0    0    0    0    0    0    1    0
## [8,]    0    0    0    1    0    0    0    0

3. Compute the inner product of vectors

For example of phi = 1/2|0> + sqrt(3)/2|1> and si= 1/sqrt(2)(10> + |1>) then the inner product is the dot product of the vectors

phi = matrix(c(1/2,sqrt(3)/2),nrow=2,ncol=1)
si = matrix(c(1/sqrt(2),1/sqrt(2)),nrow=2,ncol=1)
angle= innerProduct(phi,si)
cat("Angle between vectors is:",angle)
## Angle between vectors is: 15

4. Compute the dagger function for a gate

The gate dagger computes and displays the transpose of the complex conjugate of the matrix

TGate(I2)
##      [,1]                 [,2]
## [1,] 1+0i 0.0000000+0.0000000i
## [2,] 0+0i 0.7071068+0.7071068i
GateDagger(TGate(I2))
##      [,1]                 [,2]
## [1,] 1+0i 0.0000000+0.0000000i
## [2,] 0+0i 0.7071068-0.7071068i

5. Invoking gates in series

The Quantum gates can be chained by passing each preceding Quantum gate as the argument. The final gate in the chain will have the qubit or the identity matrix passed to it.

# Call in reverse order
# Superposition states
# |+> state
Hadamard(q0_)
##           [,1]
## [1,] 0.7071068
## [2,] 0.7071068
# |-> ==> H x Z 
PauliZ(Hadamard(q0_))
##            [,1]
## [1,]  0.7071068
## [2,] -0.7071068
# (+i) Y ==> H x  S 
 SGate(Hadamard(q0_))
##                      [,1]
## [1,] 0.7071068+0.0000000i
## [2,] 0.0000000+0.7071068i
# (-i)Y ==> H x S1
 S1Gate(Hadamard(q0_))
##                      [,1]
## [1,] 0.7071068+0.0000000i
## [2,] 0.0000000-0.7071068i
# Q1 -- TGate- Hadamard
Q1 = Hadamard(TGate(I2))

6. More gates in series

TGate of depth 2

The Quantum circuit for a TGate of Depth 2 is

Q0 — Hadamard-TGate-Hadamard-TGate-SGate-Measurement as shown in IBM’s Quantum Experience Composer

Untitled

Implementing the quantum gates in series in reverse order we have

# Invoking this in reverse order we get
a = SGate(TGate(Hadamard(TGate(Hadamard(q0_)))))
result=measurement(a)

plotMeasurement(result)

fig0-1

7. Invoking gates in parallel

To obtain the results of gates in parallel we have to take the Tensor Product Note:In the TensorProduct invocation the Identity matrix is passed as an argument to get the unitary matrix of the gate. Q0 – Hadamard-Measurement Q1 – Identity- Measurement

# 
a = TensorProd(Hadamard(I2),I2)
b = DotProduct(a,q00_)
plotMeasurement(measurement(b))

fig1-1

a = TensorProd(PauliZ(I2),Hadamard(I2))
b = DotProduct(a,q00_)
plotMeasurement(measurement(b))

fig1-2

8. Measurement

The measurement of a Quantum circuit can be obtained using the measurement function. Consider the following Quantum circuit
Q0 – H-T-H-T-S-H-T-H-T-H-T-H-S-Measurement where H – Hadamard gate, T – T Gate and S- S Gate

a = SGate(Hadamard(TGate(Hadamard(TGate(Hadamard(TGate(Hadamard(SGate(TGate(Hadamard(TGate(Hadamard(I2)))))))))))))
measurement(a)
##          0        1
## v 0.890165 0.109835

9. Plot measurement

Using the same example as above Q0 – H-T-H-T-S-H-T-H-T-H-T-H-S-Measurement where H – Hadamard gate, T – T Gate and S- S Gate we can plot the measurement

a = SGate(Hadamard(TGate(Hadamard(TGate(Hadamard(TGate(Hadamard(SGate(TGate(Hadamard(TGate(Hadamard(I2)))))))))))))
result = measurement(a)
plotMeasurement(result)

fig2-1

10. Evaluating a Quantum Circuit

The above procedures for evaluating a quantum gates in series and parallel can be used to evalute more complex quantum circuits where the quantum gates are in series and in parallel.

Here is an evaluation of one such circuit, the Bell ZQ state using the QCSimulator (from IBM’s Quantum Experience)

pic3

# 1st composite
a = TensorProd(Hadamard(I2),I2)
# Output of CNOT
b = CNOT2_01(a)
# 2nd series
c=Hadamard(TGate(Hadamard(SGate(I2))))
#3rd composite
d= TensorProd(I2,c)
# Output of 2nd composite
e = DotProduct(b,d)
#Action of quantum circuit on |00>
f = DotProduct(e,q00_)
result= measurement(f)
plotMeasurement(result)

fig3-1

11. Toffoli State

This circuit for this comes from IBM’s Quantum Experience. This circuit is available in the package. This is how the state was constructed. This circuit is shown below

pic2

The implementation of the above circuit in QCSimulator is as below

  # Computation of the Toffoli State
    H=1/sqrt(2) * matrix(c(1,1,1,-1),nrow=2,ncol=2)
    I=matrix(c(1,0,0,1),nrow=2,ncol=2)

    # 1st composite
    # H x H x H
    a = TensorProd(TensorProd(H,H),H)
    # 1st CNOT
    a1= CNOT3_12(a)

    # 2nd composite
    # I x I x T1Gate
    b = TensorProd(TensorProd(I,I),T1Gate(I))
    b1 = DotProduct(b,a1)
    c = CNOT3_02(b1)

    # 3rd composite
    # I x I x TGate
    d = TensorProd(TensorProd(I,I),TGate(I))
    d1 = DotProduct(d,c)
    e = CNOT3_12(d1)

    # 4th composite
    # I x I x T1Gate
    f = TensorProd(TensorProd(I,I),T1Gate(I))
    f1 = DotProduct(f,e)
    g = CNOT3_02(f1)

    #5th composite
    # I x T x T
    h = TensorProd(TensorProd(I,TGate(I)),TGate(I))
    h1 = DotProduct(h,g)
    i = CNOT3_12(h1)

    #6th composite
    # I x H x H
    j = TensorProd(TensorProd(I,Hadamard(I)),Hadamard(I))
    j1 = DotProduct(j,i)
    k = CNOT3_12(j1)

    # 7th composite
    # I x H x H
    l = TensorProd(TensorProd(I,Hadamard(I)),Hadamard(I))
    l1 = DotProduct(l,k)
    m = CNOT3_12(l1)
    n = CNOT3_02(m)

    #8th composite
    # T x H x T1
    o = TensorProd(TensorProd(TGate(I),Hadamard(I)),T1Gate(I))
    o1 = DotProduct(o,n)
    p = CNOT3_02(o1)
    result = measurement(p)
    plotMeasurement(result)

fig4-1

12. GHZ YYX measurement

Here is another Quantum circuit, namely the entangled GHZ YYX state. This is

pic1

and is implemented in QCSimulator as

# Composite 1
a = TensorProd(TensorProd(Hadamard(I2),Hadamard(I2)),PauliX(I2))
b= CNOT3_12(a)
c= CNOT3_02(b)
# Composite 2
d= TensorProd(TensorProd(Hadamard(I2),Hadamard(I2)),Hadamard(I2))
e= DotProduct(d,c)
#Composite 3
f= TensorProd(TensorProd(S1Gate(I2),S1Gate(I2)),Hadamard(I2))
g= DotProduct(f,e)
#Composite 4
i= TensorProd(TensorProd(Hadamard(I2),Hadamard(I2)),I2)
j = DotProduct(i,g)
result=measurement(j)
plotMeasurement(result)

fig5-1

Conclusion

The 5 qubit Quantum Computing Simulator is now fully functional. I hope to add more gates and functionality in the months to come.

Feel free to install the package from Github and give it a try.

Disclaimer: This article represents the author’s viewpoint only and doesn’t necessarily represent IBM’s positions, strategies or opinions

References

  1. IBM’s Quantum Experience
  2. Quantum Computing in Python by Dr. Christine Corbett Moran
  3. Lecture notes-1
  4. Lecture notes-2
  5. Quantum Mechanics and Quantum Computationat edX- UC, Berkeley

My other posts on Quantum Computing

  1. Venturing into IBM’s Quantum Experience 2.Going deeper into IBM’s Quantum Experience!
  2. A primer on Qubits, Quantum gates and Quantum Operations
  3. Exploring Quantum Gate operations with QCSimulator
  4. Taking a closer look at Quantum gates and their operations

You may also like
For more posts on other topics like Cloud Computing, IBM Bluemix, Distributed Computing, OpenCV, R, cricket please check my Index of posts

Taking a closer look at Quantum gates and their operations


This post is a continuation of my earlier post ‘Exploring Quantum gate operations with QCSimulator’. Here I take a closer look at more quantum gates and their operations, besides implementing these new gates in my Quantum Computing simulator, the  QCSimulator in R.

Disclaimer: This article represents the author’s viewpoint only and doesn’t necessarily represent IBM’s positions, strategies or opinions

In  quantum circuits, gates  are unitary matrices which operate on 1,2 or 3 qubit systems which are represented as below

1 qubit
|0> = \begin{pmatrix}1\\0\end{pmatrix} and |1> = \begin{pmatrix}0\\1\end{pmatrix}

2 qubits
|0> \otimes |0> = \begin{pmatrix}1\\ 0\\ 0\\0\end{pmatrix}
|0> \otimes |1> = \begin{pmatrix}0\\ 1\\ 0\\0\end{pmatrix}
|1> \otimes |o> = \begin{pmatrix}0\\ 0\\ 1\\0\end{pmatrix}
|1> \otimes |1> = \begin{pmatrix}0\\ 0\\ 0\\1\end{pmatrix}

3 qubits
|0> \otimes |0> \otimes |0> = \begin{pmatrix}1\\ 0\\0\\ 0\\ 0\\0\\ 0\\0\end{pmatrix}
|0> \otimes |0> \otimes |1> = \begin{pmatrix}0\\ 1\\0\\ 0\\ 0\\0\\ 0\\0\end{pmatrix}
|0> \otimes |1> \otimes |0> = \begin{pmatrix}0\\ 0\\1\\ 0\\ 0\\0\\ 0\\0\end{pmatrix}


|1> \otimes |1> \otimes |1> = \begin{pmatrix}0\\ 0\\0\\ 0\\ 0\\0\\ 0\\1\end{pmatrix}
Hence single qubit is represented as 2 x 1 matrix, 2 qubit as 4 x 1 matrix and 3 qubit as 8 x 1 matrix

1) Composing Quantum gates in series
When quantum gates are connected in a series. The overall effect of the these quantum gates in series is obtained my taking the dot product of the unitary gates in reverse. For e.g.
Untitled

In the following picture the effect of the quantum gates A,B,C is the dot product of the gates taken reverse order
result = C . B . A

This overall action of the 3 quantum gates can be represented by a single ‘transfer’ matrix which is the dot product of the gates
Untitled

If we had a Pauli X followed by a Hadamard gate the combined effect of these gates on the inputs can be deduced by constructing a truth table

Input Pauli X – Output A’ Hadamard – Output B
|0> |1> 1/√2(|0>  – |1>)
|1> |0> 1/√2(|0>  + |1>)

Or

|0> -> 1/√2(|0>  – |1>)
|1> -> 1/√2(|0>  + |1>)
which is
\begin{pmatrix}1\\0\end{pmatrix}  ->1/√2 \begin{pmatrix}1\\0\end{pmatrix}\begin{pmatrix}0\\1\end{pmatrix} = 1/√2  \begin{pmatrix}1\\-1\end{pmatrix}
\begin{pmatrix}0\\1\end{pmatrix}  ->1/√2 \begin{pmatrix}1\\0\end{pmatrix} + \begin{pmatrix}0\\1\end{pmatrix} = 1/√2  \begin{pmatrix}1\\1\end{pmatrix}
Therefore the ‘transfer’ matrix can be written as
T = 1/√2 \begin{pmatrix}1 & 1\\ -1 & 1\end{pmatrix}

2)Quantum gates in parallel
When quantum gates are in parallel then the composite effect of the gates can be obtained by taking the tensor product of the quantum gates.
Untitled

If we consider the combined action of a Pauli X gate and a Hadamard gate in parallel
Untitled

A B A’ B’
|0> |0> |1> 1/√2(|0>  + |1>)
|0> |1> |1> 1/√2(|0>  – |1>)
|1> |0> |0> 1/√2(|0>  + |1>)
|1> |1> |0> 1/√2(|0>  – |1>)

Or

|00> => |1> \otimes 1/√2(|0>  + |1>) = 1/√2 (|10> + |11>)
|01> => |1> \otimes 1/√2(|0>  – |1>) = 1/√2 (|10> – |11>)
|10> => |0> \otimes 1/√2(|0>  + |1>) = 1/√2 (|00> + |01>)
|11> => |0> \otimes 1/√2(|0>  – |1>) = 1/√2 (|10> – |11>)

|00> = \begin{pmatrix}1\\ 0\\ 0\\0\end{pmatrix} =>1/√2\begin{pmatrix} 0\\ 0\\ 1\\ 1\end{pmatrix}
|01> = \begin{pmatrix}0\\ 1\\ 0\\0\end{pmatrix} =>1/√2\begin{pmatrix} 0\\ 0\\ 1\\ -1\end{pmatrix}
|10> = \begin{pmatrix}0\\ 0\\ 1\\0\end{pmatrix} =>1/√2\begin{pmatrix} 1\\ 0\\ 1\\ -1\end{pmatrix}
|11> = \begin{pmatrix}0\\ 0\\ 0\\1\end{pmatrix} =>1/√2\begin{pmatrix} 1\\ 0\\ -1\\ -1\end{pmatrix}

Here are more Quantum gates
a) Rotation gate
U = \begin{pmatrix}cos\theta & -sin\theta\\ sin\theta & cos\theta\end{pmatrix}

b) Toffoli gate
The Toffoli gate flips the 3rd qubit if the 1st and 2nd qubit are |1>

Toffoli gate
Input Output
|000> |000>
|001> |001>
|010> |010>
|011> |011>
|100> |100>
|101> |101>
|110> |111>
|111> |110>

c) Fredkin gate
The Fredkin gate swaps the 2nd and 3rd qubits if the 1st qubit is |1>

Fredkin gate
Input Output
|000> |000>
|001> |001>
|010> |010>
|011> |011>
|100> |100>
|101> |110>
|110> |101>
|111> |111>

d) Controlled U gate
A controlled U gate can be represented as
controlled U = \begin{pmatrix}1 & 0 & 0 & 0\\ 0 &1  &0  & 0\\ 0 &0  &u11  &u12 \\ 0 & 0 &u21  &u22 \end{pmatrix}   – (A)
where U =  \begin{pmatrix}u11 &u12 \\ u21 & u22\end{pmatrix}

e) Controlled Pauli gates
Controlled Pauli gates are created based on the following identities. The CNOT gate is a controlled Pauli X gate where controlled U is a Pauli X gate and matches the CNOT unitary matrix. Pauli gates can be constructed using

a) H x X x H = Z    &
H x H = I

b) S x X x S1
S x S1 = I

the controlled Pauli X, Y , Z are contructed using the CNOT for the controlled X in the above identities
In general a controlled Pauli gate can be created as below
Untitled

f) CPauliX
Here C is the 2 x2  Identity matrix. Simulating this in my QCSimulator
CPauliX I=matrix(c(1,0,0,1),nrow=2,ncol=2)
# Compute 1st composite
a = TensorProd(I,I)
b = CNOT2_01(a)
# Compute 1st composite
c = TensorProd(I,I)
#Take dot product
d = DotProduct(c,b)
#Take dot product with qubit
e = DotProduct(d,q)
e
}

Implementing the above with I, S, H gives Pauli X, Y and Z as seen below

library(QCSimulator)
I4=matrix(c(1,0,0,0,
            0,1,0,0,
            0,0,1,0,
            0,0,0,1),nrow=4,ncol=4)

#Controlled Pauli X
CPauliX(I4)
##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    0    1    0    0
## [3,]    0    0    0    1
## [4,]    0    0    1    0
#Controlled Pauli Y
CPauliY(I4)
##      [,1] [,2] [,3] [,4]
## [1,] 1+0i 0+0i 0+0i 0+0i
## [2,] 0+0i 1+0i 0+0i 0+0i
## [3,] 0+0i 0+0i 0+0i 0-1i
## [4,] 0+0i 0+0i 0+1i 0+0i
#Controlled Pauli Z
CPauliZ(I4)
##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    0    1    0    0
## [3,]    0    0    1    0
## [4,]    0    0    0   -1

g) CSWAP gate

Untitled

q00=matrix(c(1,0,0,0),nrow=4,ncol=1)
q01=matrix(c(0,1,0,0),nrow=4,ncol=1)
q10=matrix(c(0,0,1,0),nrow=4,ncol=1)
q11=matrix(c(0,0,0,1),nrow=4,ncol=1)
CSWAP(q00)
##      [,1]
## [1,]    1
## [2,]    0
## [3,]    0
## [4,]    0
#Swap qubits 
CSWAP(q01)
##      [,1]
## [1,]    0
## [2,]    0
## [3,]    1
## [4,]    0
#Swap qubits 
CSWAP(q10)
##      [,1]
## [1,]    0
## [2,]    1
## [3,]    0
## [4,]    0
CSWAP(q11)
##      [,1]
## [1,]    0
## [2,]    0
## [3,]    0
## [4,]    1

h) Toffoli state
The Toffoli state creates a 3 qubit entangled state 1/2(|000> + |001> + |100> + |111>)
Untitled

Simulating the Toffoli state in IBM Quantum Experience we get
Untitled

h) Implementation of Toffoli state in QCSimulator 

#ToffoliState 
    # Computation of the Toffoli State
    H=1/sqrt(2) * matrix(c(1,1,1,-1),nrow=2,ncol=2)
    I=matrix(c(1,0,0,1),nrow=2,ncol=2)

    # 1st composite
    # H x H x H
    a = TensorProd(TensorProd(H,H),H)
    # 1st CNOT
    a1= CNOT3_12(a)

    # 2nd composite
    # I x I x T1Gate
    b = TensorProd(TensorProd(I,I),T1Gate(I))
    b1 = DotProduct(b,a1)
    c = CNOT3_02(b1)

    # 3rd composite
    # I x I x TGate
    d = TensorProd(TensorProd(I,I),TGate(I))
    d1 = DotProduct(d,c)
    e = CNOT3_12(d1)

    # 4th composite
    # I x I x T1Gate
    f = TensorProd(TensorProd(I,I),T1Gate(I))
    f1 = DotProduct(f,e)
    g = CNOT3_02(f1)

    #5th composite
    # I x T x T
    h = TensorProd(TensorProd(I,TGate(I)),TGate(I))
    h1 = DotProduct(h,g)
    i = CNOT3_12(h1)

    #6th composite
    # I x H x H
    j = TensorProd(TensorProd(I,Hadamard(I)),Hadamard(I))
    j1 = DotProduct(j,i)
    k = CNOT3_12(j1)

    # 7th composite
    # I x H x H
    l = TensorProd(TensorProd(I,Hadamard(I)),Hadamard(I))
    l1 = DotProduct(l,k)
    m = CNOT3_12(l1)
    n = CNOT3_02(m)

    #8th composite
    # T x H x T1
    o = TensorProd(TensorProd(TGate(I),Hadamard(I)),T1Gate(I))
    o1 = DotProduct(o,n)
    p = CNOT3_02(o1)
    result = measurement(p)
    plotMeasurement(result)

a-1
The measurement is identical to the that of IBM Quantum Experience

Conclusion:  This post looked at more Quantum gates. I have implemented all the gates in my QCSimulator which I hope to release in a couple of months.

Disclaimer: This article represents the author’s viewpoint only and doesn’t necessarily represent IBM’s positions, strategies or opinions

References
1. http://www1.gantep.edu.tr/~koc/qc/chapter4.pdf
2. http://iontrap.umd.edu/wp-content/uploads/2016/01/Quantum-Gates-c2.pdf
3. https://quantumexperience.ng.bluemix.net/

Also see
1.  Venturing into IBM’s Quantum Experience
2. Going deeper into IBM’s Quantum Experience!
3.  A primer on Qubits, Quantum gates and Quantum Operations
4. Exploring Quantum gate operations with QCSimulator

Take a look at my other posts at
1. Index of posts