This website works better with desktop in both themes, for mobile devices please change to light theme.

Perceptron#

References-

[1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import make_classification

plt.style.use('seaborn')

%matplotlib inline

Algorithm Assumptions#

• Binary Classification y = {-1, 1}

• Data is linearly Separable

[150]:
X, y = make_classification(n_samples=50, n_features=2, n_redundant=0, n_classes=2, class_sep=2,
n_clusters_per_class=1, random_state=666)

y = np.where(y == 0, 1, -1) # to change classes from 0,1 to -1,1

fig = plt.figure()
sns.scatterplot(x=X[:,0], y=X[:,1], hue=y, style=y, ax=ax, palette='deep')
plt.show()

looks like data is linearly separable.

Classifier#

$$h(x_i) = sign(w^T x_i + b)$$

\begin{align*} x_i \rightarrow \begin{bmatrix} x_i \\ 1 \end{bmatrix}\\ w \rightarrow \begin{bmatrix} w \\ b \end{bmatrix} \end{align*}

if the data is shifted to center then b = 0

$$h(x_i) = sign(w^T x_i)$$

Algorithm#

• Initialize $$\vec w$$ = 0

• While True do

• m = 0

• for $$(x_i, y_i) \in D$$ do

• if $$y_i \times (w^T x_i) \le 0$$ then *$$\vec{w} \leftarrow \vec{w} + y \times \vec{x}$$

• $$m \leftarrow m + 1$$

• end if

• end for

• if m = 0 then

• break

• end if

• end while

$$y_i \times (w^T x_i) \gt 0$$ means correctly classified sample

Implementation#

[151]:
n_samples, n_features = X.shape
w = np.zeros((n_features))

while True:
m = 0
for i in range(n_samples):
if (y[i] * (w.T @ X[i])) <= 0: # misclassification
w = w + (y[i] * X[i])
m = m + 1

## if this loop was while(infinite) then when the classifier perfectly fits
## 0 misclassifications
## then m = 0 and it will break the loop
if m == 0:
break
[152]:
y_pred = np.int32(np.sign(w @ X.T))
[153]:
fig = plt.figure(figsize=(10,5))
sns.scatterplot(x=X[:,0], y=X[:,1], hue=y, style=y, ax=ax, palette='deep')
ax.set_title("y true")

sns.scatterplot(x=X[:,0], y=X[:,1], hue=y_pred, style=y_pred, ax=ax, palette='deep')
ax.set_title("y pred")

plt.show()

It matched most of the examples except one that is misclassified.

Hyperplane#

w = weight vector that defines the hyperplane

hyperplane \begin{align} \mathcal{H} = \{ x: (w^T x + b) > 0 \} \end{align} perpendicular to the weight w

[159]:
xx, yy = np.meshgrid(np.linspace(X.min() - 1, X.max() + 1, 100),np.linspace(X.min() - 1 , X.max() + 1, 100))

mesh = np.c_[xx.ravel(), yy.ravel()]
mesh_pred = np.int32(np.sign(w @ mesh.T))

zz = mesh_pred.reshape(xx.shape)
[160]:
fig = plt.figure(figsize=(5, 5))

sns.scatterplot(x=X[:,0], y=X[:,1], hue=y, style=y, ax=ax, palette='deep')
ax.quiver(0, 0, w[0], w[1],  scale=1, units='xy',
label='weight vector') # weight vector will always go through zero
ax.contourf(xx, yy, zz, cmap=plt.cm.coolwarm, alpha=0.3)

plt.legend()
plt.show()

Intuition#

[161]:
x = X[42,:]
origin = np.zeros_like(w)
sum_w_x = w + x

print(f"""
Origin : {origin}
x      : {x}
w      : {w}
w + x  : {sum_w_x}
""")

fig = plt.figure(figsize=(5, 5))

ax.quiver(*origin, *x, color='b',
scale=1, units='xy', label='data vector') # weight vector will always go through zero
ax.quiver(*w, *x, scale=1, units='xy', alpha=0.7, color='b', label='data vector')

ax.quiver(*origin, *w, color='k', scale=1,
units='xy', label='weight vector') # weight vector will always go through zero
ax.quiver(*origin, *sum_w_x, scale=1, units='xy', color='r')
ax.contourf(xx, yy, zz, cmap=plt.cm.coolwarm, alpha=0.4)

ax.text(*x, 'x', color='b')
ax.text(*w, 'w', color='k')
ax.text(*sum_w_x, 'w + x', color='r')

plt.legend()
plt.show()

Origin : [0. 0.]
x      : [-2.73210556  1.34370036]
w      : [2.95108238 0.53659625]
w + x  : [0.21897682 1.8802966 ]

\begin{align} \vec{w} x \le 0\\ \text{ after k updates/iterations } (w + kx)^T x \le 0 \\ \text{ and still getting it wrong}\\ w^Tx + k x^Tx \le 0\\ k \le \frac{- w^Tx}{x^Tx} \end{align}

this explains that k exists as a solution less than infinite.

\begin{align} \text{there exists } \exists{w^*} && s.t. \forall(\vec{x},\vec{y}) \in D && yw^Tx \gt 0\\ ||w^*|| = 1\\ ||\vec{x}|| \lt 1 && \forall (\vec{x},\vec{y}) \in D\\ \\ yw^Tx \gt 0 \begin{cases} y = +1 \Rightarrow w^Tx \gt 0\\ y = -1 \Rightarrow w^Tx \lt 0\\ \end{cases}\\ \\ w \leftarrow w + yx \begin{cases} y = +1 \Rightarrow w \leftarrow w + x\\ y = -1 \Rightarrow w \leftarrow w - x\\ \end{cases} \end{align}

$$w^*$$ is the optimum solution.

Margin#

Minimum distance from closest data poin and hyperplane.

\begin{align} \gamma = {min\atop{x,y \in D}} | x^T w^* | \gt 0 \end{align}

[165]:
distances = np.abs(w @ X.T)
min_dist_idx = np.argmin(distances)

print(f"""
minimum distance index : {min_dist_idx}
gamma                  : {distances[min_dist_idx]}
""")

fig = plt.figure(figsize=(10, 5))

sns.scatterplot(x=X[:,0], y=X[:,1], hue=y, style=y, ax=ax, palette='deep')
ax.contourf(xx, yy, zz, cmap=plt.cm.coolwarm, alpha=0.3)

sns.scatterplot(x=X[:,0], y=X[:,1], hue=y, style=y, ax=ax, palette='deep')
ax.contourf(xx, yy, zz, cmap=plt.cm.coolwarm, alpha=0.3)
sns.scatterplot(x=X[min_dist_idx,[0]], y=X[min_dist_idx,[1]], color='k',
marker='o', ax=ax, label='min distant point')

plt.legend()
plt.show()

minimum distance index : 49
gamma                  : 1.94312938664885

typically we want a large margin classifier.(we will go back to svm for this)

with each iteration atleast a gamma is added.

\begin{align} w^Tw^* + y(x^T w^*) \ge w^Tw^* + \gamma \end{align}