In this age of Amazon, Netflix and App stores where products, movies and apps are purchased online the method of up-selling and cross-selling online is through the use of recommender based systems.

When you go to site like Amazon/Flipkart or purchase apps on App store/Google Play we often see things like “People who bought this book/app also bought X, Y, Z”. These recommendations are the recommender system algorithms in action.

Recently, Netflix ran a competition in which users had to come with the best algorithm to recommend films that a user would also like. The prize money for this was of the order of $1 million. That’s how critical recommender systems are to organizations of today where most of the transactions happen on the web.

Typically users are asked to give a rating of 1 to 5 with 1 being the lowest and 5 being the highest. So for example if we had classics like Moby Dick, Great Expectations and current best sellers like The Client, The da Vinci Code and a Science Fiction like 2001- A Space Odyssey we can expect that different people will rate the books differently. Obviously not everybody would have read every book in the list and some elements would be blank.

Recommender Systems are based on machine learning algorithms. The goal of these algorithms is to predict what score any user would give to books they did not rate. In other words what would be rating the buyers would give for books or apps they did not buy. So if the algorithm predicts a high rating then we could recommend that the user would also ‘like’ them. Or we could give recommendations of books/apps bought by users who bought the books/apps bought by this user.

The notation is

n_{u} = Number of users

n_{b }= Number of books

r^{(i,j)} = Boolean whether user j rated a book i

y^{(i,j)} = The rating user j gave book i

m_{j } = The number of books that user j rated

**Content based recommendation**

In a typical content based recommendation algorithm we assume that we have data about some items we want to recommend rating for e.g. books/products/apps. In the example for books bought in an online bookstore we assume some features in our case ‘classic’, “fiction” etc

So each book has its own feature vector where x^{1 }is the feature vector of the first book x^{2 } feature vector of the 2nd book and so on

This can be done through linear regression by minimizing the cost function of the sum of squared errors from the predicted value

So for a parameter vector Ɵ^{j}and a feature vector x^{i} the recommender system will try to predict the rating that a user j will give a book i.

This can be written as

Number of stars (rating) = (θ^{j})^{ T} x^{i}

^{ }

This reduces to the minimization problem over all θ^{j} for r=1

min 1/2m Σ ((θj)^{T} x^{i} – y ^{i,j})^{2}

θ^{j }_{ }i:r=1

Adding the regularization term this becomes

min 1/2m Σ((θ^{j})^{T} x^{i} – y ^{i,j})^{2 } + λ/2m(Σ θ^{j})^{2}

θ^{j }_{ }i:r=1

The recommender algorithm in essence tries to learn parameters θ^{j} for a set of features of x^{i }the chosen system for e.g. books in this case.

The recommender tries to learn the parameters for all the users

min 1/2m Σ Σ((θ^{j})^{T} x^{i } – y ^{i,j})^{2 } + λ/2m(Σ Σ θ^{j})^{2}

θ^{1}…θ^{n }_{ }i:r=1

The minimization is performed by gradient descent as

Θ^{j }_{k}:= Θ^{j}_{k} – α (Σ((θ^{j})^{T} x^{i } – y ^{i,j})x^{i} + λ Θ^{j }_{k}

_{ }

Recommender systems tries to learn the parameters for a set of chosen features over all users. Based on the learnt paramaters it then tries to predict the rating the user would give to books/apps that he is yet to purchase and push up those apps for which the user is likely to give a high rating based on the given set of ratings.

Recommender systems contribute substantially to the revenues of e-commerce sites like Amazon, Flipkart, Netflix etc

Note: This post, line previous posts on Machine Learning, is based on the Coursera course on Machine Learning by Professor Andrew Ng

Find me on Google+