Polynomial Regression

Polynomial regression is used to find the coefficients of an m-degree polynomial that best fits the given set of points.

Click on the canvas to introduce a point
Click on a point to remove it

Proper fitting with a polynomial of degree m requires atleast m+1 points.

Description

Consider a polynomial of degree m: \[ p_m (x) = \theta_0 + \theta_1 x + \theta_2 x^2 + \dots + \theta_m x^m \] Polynomial regression allows us to find the coefficients of this polynomial that best fit the given set of \(n\) points. The cost function \( J(\theta) \) expresses the error between the polynomial and the given set of points. It is given by: \[ J(\theta) = \sum_{i = 1}^{n} (y_i - p_m (x_i))^2 \] where \( (x_i, y_i) \) are the given points. A lower value of this function indicates a better fit. Hence, we need to find the minimum of this function. Calculus dictates that at the minimum of this function: \[ \frac{d(J(\theta))}{d\theta} = 0 \] The above equation is split into \( m+1 \) equations in \(\theta\), where each equation corresponds to a particular power of \( x \) in the polynomial. This system of equations can be better expressed (using linear algebra) in the form \( A \vec{\theta} = B \). Solving this system of equations (using Gaussain elimination or inversion of \(A\)) gives us the value of \(\theta\)'s for which the cost function is minimized. The corresponding polynomial is the best fit to the given set of points.

Please refer to this website for viewing the exact mathematics involved in polynomial regression.

Note:

A set of n points can be exactly fitted with a polynomial of degree n-1. This is the general analogue of the fact that there is always a line (which is a 1 dimensional polynomial) joining 2 points.
A polynomial of degree m requires atleast m+1 points for a proper fit. If this condition is not satisfied, then the extra coefficients will be extremely low or zero, which badly skewes the curve.
Gradient Descent can be used for polynomial regression too, and adopts an iterative approach to find the best fit. It turns out that this method does not scale well, as the number of points and the degree of polynomial increases.
Related: Gradient Descent, K-means clustering, Genetic Algorithm.

Developed by ChanRT | Fork me at GitHub