Linear Regression#

References#

Intro#

  • linear regression, a very simple approach for supervised learning. In particular, linear regression is a useful tool for predicting a quantitative response.

  • In statistics, linear regression is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables).

  • The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression.

                training set
                    |
                    V
            Learning algorithm
                    |
                    V
            +-------------+
test -->    | hypothesis  | --> estimation
            |    / Model  |
            +-------------+

\begin{align*} \text{Hypothesis } h_\theta(x) &= \theta_0 + \theta_1 x \text{, Where Parameters } : \theta_0, \theta_1 \end{align*}

\begin{align*} \text{Cost Function } &: J(\theta_0,\theta_1) = \frac{1}{2m}\sum_{i=1}^{m}(h_\theta{(x^{(i)}) - y^{(i)}})^2\\ \\ \text{Goal} &: {{minimize}\atop{\theta_0,\theta_1}} J(\theta_0,\theta_1) \end{align*}

Generate Data#

[1]:
import pandas as pd
import numpy as np
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
import seaborn as sns
[2]:
sample_size = 300
train_size = 0.7  # 70%

X, y = make_regression(n_samples=sample_size, n_features=1, noise=50, random_state=0)
y = y.reshape(-1, 1)


np.random.seed(10)
random_idxs = np.random.permutation(np.arange(0, sample_size))[
    : int(np.ceil(sample_size * train_size))
]

X_train, y_train = X[random_idxs], y[random_idxs]
X_test, y_test = np.delete(X, random_idxs).reshape(-1, 1), np.delete(
    y, random_idxs
).reshape(-1, 1)

plt.plot(X_train, y_train, "o", label="train")
plt.plot(X_test, y_test, "o", label="test")
plt.legend()
plt.show()
../_images/LinearRegression_Explore_7_0.png
  • X = (m,n), where
    m = number of samples
    n = number of features
  • Add \(X_0\) (column 0 as 1) for bias in linear regression \begin{align} h_\theta(x) &= \theta_0 X_0 + \theta_1 X_1 \\ \because X_0 &= 1 \\ h_\theta(x) &= \theta_0 + \theta_1 X_1 \\ \end{align}

[3]:
print(X.shape, y.shape)
(300, 1) (300, 1)
[4]:
m, n = X.shape
print("number of columns (features) :", n)
print("number of samples (rows) :", m)
number of columns (features) : 1
number of samples (rows) : 300
[5]:
def add_axis_for_bias(X_i):

    X_i = X_i.copy()
    if len(X_i.shape) == 1:
        X_i = X_i.reshape(-1, 1)

    if False in (X_i[..., 0] == 1):
        return np.hstack(tup=(np.ones(shape=(X_i.shape[0], 1)), X_i))
    else:
        return X_i

Check for bias column(column 0)#

[6]:
arr = np.array(
    [
        [1, 2, 3, 4],
        [1, 6, 7, 8],
        [1, 11, 12, 13],
        [1, 4, 2, 4],
        [1, 5, 2, 1],
        [1, 7, 54, 23],
    ]
)

arr.shape
[6]:
(6, 4)
  • this means 6 rows and 4 columns

[7]:
False not in (arr[..., 0] == 1)
[7]:
True
[8]:
np.sum(arr), np.sum(arr, axis=0), np.sum(arr, axis=1)
[8]:
(174, array([ 6, 35, 80, 53]), array([10, 22, 37, 11,  9, 85]))
[9]:
np.mean(arr), np.mean(arr, axis=0), np.mean(arr, axis=1)
[9]:
(7.25,
 array([ 1.        ,  5.83333333, 13.33333333,  8.83333333]),
 array([ 2.5 ,  5.5 ,  9.25,  2.75,  2.25, 21.25]))

for us calculation will be done column wise so axis = 0 everywhere

Regression Models Error Evaluation Functions#

Mean Squared Error#

[10]:
def calculate_mse(y_pred, y):
    return np.square(y_pred - y).mean()

Root Mean Squared Error#

[11]:
def calculate_rmse(y_pred, y):
    return np.sqrt(np.square(y_pred - y).mean())

Mean Absolute Error#

[12]:
def calculate_mae(y_pred, y):
    return np.abs(y_pred - y).mean()

Algorithm#

Cost Function#

\begin{align} J(\theta_0,\theta_1) & =\frac{1}{2m}\sum_{i=1}^{m}({h_\theta{(x^{(i)}) - y^{(i)}}})^2 \end{align}

[13]:
def calculate_cost(y_pred, y):
    return np.mean(np.square(y_pred - y)) / 2

Derivative#

\begin{align*} \frac{\partial J(\theta_0,\theta_1)}{\partial \theta} &= \frac{1}{m} \sum_{i=1}^{m}(h_{\theta}(x^{(i)}) - y^{(i)}). x^{(i)} \end{align*}

[14]:
def derivative(X, y, y_pred):
    return np.mean((y_pred - y) * X, axis=0)

Gradient Descent Algorithm#

\begin{align*} \text{repeat until convergence } \{ \\ \theta_j &:= \theta_j - \alpha \frac{\partial}{\partial\theta_j}{J(\theta_0,\theta_1)}\\ \}\text{ for j=0 and j=1 } \\\\ \text{and simultaneously update }\\ temp_0 &:= \theta_0 - \alpha\frac{\partial}{\partial\theta_0}{J(\theta_0,\theta_1)}\\ temp_1 &:= \theta_1 - \alpha\frac{\partial}{\partial\theta_1}{J(\theta_0,\theta_1)}\\ \theta_0 &:= temp_0\\ \theta_1 &:= temp_1 \end{align*}

Final algorithm#

\begin{align*} \text{repeat until convergence }\{\\ \theta_0 &:= \theta_0 - \alpha \frac{1}{m}{\sum_{i=1}^{m}{(h_\theta(x^{(i)}) - y^{(i)})}}\\ \theta_1 &:= \theta_1 - \alpha \frac{1}{m}{\sum_{i=1}^{m}{(h_\theta(x^{(i)}) - y^{(i)})}}.{x^{(i)}}\\ \} \end{align*}

Generate Prediction#

[15]:
def predict(theta, X):
    format_X = add_axis_for_bias(X)

    if format_X.shape[1] == theta.shape[0]:
        y_pred = format_X @ theta  # (m,1) = (m,n) * (n,1)
        return y_pred
    elif format_X.shape[1] == theta.shape[1]:
        y_pred = format_X @ theta.T  # (m,1) = (m,n) * (n,1)
        return y_pred
    else:
        raise ValueError("Shape is not proper.")

Batch Gradient Descent#

[16]:
def linear_regression_bgd(
    X, y, verbose=True, theta_precision=0.001, alpha=0.01, iterations=10000
):
    X = add_axis_for_bias(X)

    # number of features+1 because of theta_0
    m, n = X.shape

    theta_history = []
    cost_history = []

    theta = np.random.rand(1, n) * theta_precision

    for _ in range(iterations):
        # calculate y_pred
        theta_history.append(theta[0])

        y_pred = predict(theta, X)

        # simultaneous operation
        gradient = derivative(X, y, y_pred)

        theta = theta - (alpha * gradient)  # override with new θ

        if np.isnan(np.sum(theta)) or np.isinf(np.sum(theta)):
            print("breaking. found inf or nan.")
            break

        # calculate cost to put in history
        cost = calculate_cost(predict(theta, X), y)

        cost_history.append(cost)

    return theta, np.array(theta_history), np.array(cost_history)

Cost function vs weights \(J(\theta_0, \theta_1)\)#

[17]:
theta0, theta1 = np.meshgrid(np.linspace(-10, 10, 100), np.linspace(-10, 10, 100))
j = theta0**2 + theta1**2

fig = plt.figure(figsize=(7, 7))
ax = plt.axes(projection="3d")

ax.plot_surface(theta0, theta1, j, cmap="viridis")
ax.set_xlabel("$\\theta_0$")
ax.set_ylabel("$\\theta_1$")
ax.set_zlabel("$J(\\theta_0,\\theta_1)$")

plt.show()
../_images/LinearRegression_Explore_39_0.png

Debugging gradient descent#

  1. if learning rate is high, theta value will increase too much and eventually shoot out of the plot. see an example below.

  2. if learning rate is appropriate then value of cost/loss will decrease and eventually flatlines (decrement is too less to matter). see an example below

  3. we can try different learning rate like 0.001, 0.003, 0.01, 0.03, 0.1 etc.

Animation Function#

[18]:
from utility import regression_animation

Training with high learning rate#

[19]:
iterations = 600
learning_rate = 2.03

theta, theta_history, cost_history = linear_regression_bgd(
    X_train,
    y_train,
    verbose=True,
    theta_precision=0.001,
    alpha=learning_rate,
    iterations=iterations,
)

regression_animation(
    X_train, y_train, cost_history, theta_history, iterations, interval=10
)
[19]:
../_images/LinearRegression_Explore_45_1.png

learning rate is high, model couldn’t converge(find minima in loss-theta curve), hence :math:`J(theta_0, theta_1)` increased.

Training with learning rate = 0.01#

[20]:
iterations = 300
learning_rate = 0.01

theta, theta_history, cost_history = linear_regression_bgd(
    X_train,
    y_train,
    verbose=True,
    theta_precision=0.001,
    alpha=learning_rate,
    iterations=iterations,
)

regression_animation(
    X_train, y_train, cost_history, theta_history, iterations, interval=10
)
[20]:
../_images/LinearRegression_Explore_48_1.png

Testing#

[21]:
regression_animation(
    X_test, y_test, cost_history, theta_history, iterations, interval=10
)
[21]: