In both linear and logistic regression the choice of the degree of the polynomial for the hypothesis function is extremely critical. A low degree for the polynomial can result in an underfit, while a very high degree can overfit the data as shown below

The figure on the left the data is underfit as we try to fit the data with a first order polynomial which is a straight line. This is a case of strong ‘bias’

The rightmost figure a much higher polynomial is used. All the data points are covered by the polynomial curve however it is not effective in predicting other values. This is a case of overfitting or a high variance.

The middle figure is just right as it intuitively fits the data points the best possible way.

A similar problem exists with logistic regression as shown below

There are 2 ways to handle overfitting

a) Reducing the number of features selected

b) Using regularization

In regularization the magnitude of the parameters Ɵ is decreased to reduce the effect of overfitting

Hence if we choose a hypothesis function

h_{Ɵ }(x) = Ɵ_{0} + Ɵ_{1}x_{1}^{2} + Ɵ_{2}x_{2}^{2} + Ɵ_{3}x_{3}^{3} + ^{ }Ɵ_{4}x_{4}^{4}

^{ }

The cost function for this without regularization as mentioned in earlier posts

J(Ɵ) = 1/2m Σ(h_{Ɵ} (x^{i} – y^{i})^{2}

Where the key is minimize the above function for the least error

The cost function with regularization becomes

J(Ɵ) = 1/2m Σ(h_{Ɵ} (x^{i} – y^{i})^{2 + }λ Σ Ɵ_{j}^{2}

^{ }

As can be seen the regularization now adds a factor Ɵ_{j}^{2} as a part of the cost function which needs to be minimized.

Hence with the regularization factor the problem of underfitting/overfitting can be solved

However the trick is determine the value of λ. If λ is too big then it would result in underfitting or resulting in a high bias.

Similarly the regularized equation for logistic regression is as shown below

J(Ɵ) = |1/m Σ -y * log(h_{Ɵ }(x)) – (1-y) * (log(1 – h_{Ɵ }(x)) | + λ/2m Σ Ɵ_{j}^{2}

Some tips suggested by Prof Andrew Ng while determining the parameters and features for regression

a) Get as many training examples. It is worth spending more effort in getting as much examples

b) Add additional features

c) Observe changes to the learning algorithm with different values of λ

This post is continued in my next post – Simplifying ML: Impact of degree of polynomial on bias, variance and other insights

Note: This post, in line with my previous posts on Machine Learning, is based on the Coursera course on Machine Learning by Professor Andrew Ng

Find me on Google+

Interesting – thanks

Pingback: Simplifying ML: Impact of degree of polynomial degree on bias & variance and other insights | Giga thoughts ...

Pingback: Deep Learning from basic principles in Python, R and Octave – Part 1 | Giga thoughts …

Pingback: Deep Learning from first principles in Python, R and Octave – Part 1 - biva