Principal Component Analysis (PCA) From Scratch

  1. Normalizing the Data
#Importing necessary modules
import numpy as np
import matplotlib.pyplot as plt
x1_norm=x1-x1.mean()#X1' normalized from X1
x2_norm=x2-x2.mean()#X2' normalized from X2
Raw data converted into Zero mean normalized data
Raw Data Converted into Zero mean Normalized Data
Visualization of Raw data and Zero mean normalized data
Covariance matrix where X is a column matrix of dimension (m*1)
X_matrix=np.array((x1_norm,x2_norm))
Sx=(1/(X_matrix.shape[1]-1))*np.matmul(X_matrix,np.transpose(X_matrix))
Output
Sx=array([[ 3.51818182, -13.52454545],
[-13.52454545, 55.58878788]])
(Since non-diagonal element of the matrix Sx are negative. So, we asssume both X1 and X2 variable decrease together.)
# eigen_value calculation
m=(Sx[0][0]+Sx[1][1])/2
p=(Sx[0][0]*Sx[1][1]-(Sx[1][0]*Sx[0][1]))
lambda1=m+(m**2-p)**(1/2)
lambda2=m-(m**2-p)**(1/2)
Output
lambda1=58.89203173874124
lambda2= 0.21493795822846806
#eigen vector calculation
V1=np.array((Sx[0][1]/(lambda1-Sx[0][0]),1))
V2=np.array((Sx[0][1]/(lambda2-Sx[0][0]),1))
Output
eigen_vectorV1=[-0.24424066 1. ]
eigen_vectorV2=[ 4.09432244 1. ]
k = lambda1 / (lambda1 + lambda2)
print(str(round(k*100,2))+'% variance is explained by v1')
print(str(round((1-k)*100,2))+'% variance is explained by v2')
Output
99.64% variance is explained by v1
0.36% variance is explained by v2
The variance explained by two PCA
P=np.array((V1,V2))
Y=np.matmul(P,X_matrix)
Y1 and Y2 two principal component derived V1 and V2 eigenvectors respectively
Visualization of Choosing both components and finding the value of Y (No PCA)
P=np.array((V1))#orthonormal vector
Y=np.matmul(P,X_matrix)
Sy=(1/(Y.shape[0]-1))*np.matmul(Y,np.transpose(Y))
Output
Sy=62.40514745166816
One dimensional data plot with 99% of variance explained.

--

--

--

AI enthusiast

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Feature Importance — How’s and Why’s

Activation Functions and Initialization Methods

Word2Vec Tutorial Part 2 — Negative Sampling

Nickname Generation with Recurrent Neural Networks with PyTorch

GANs, a modern perspective

Reinforcement Learning (RL for the intimates)

Jupyter, python, Image compression and svd — An interactive exploration

Machines that learn like humans

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aayushma Pant

Aayushma Pant

AI enthusiast

More from Medium

Random Forest

Different Regression Analysis Models in Machine Learning

Introduction to PCA (Principal Component Analysis)

Can Machine Learning help in detecting the Credit Card Frauds?