We'll consider the data on the radial velocity of Galaxy NGC7531
http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/galaxy.info.txt
Radial Velocity of Galaxy NGC7531
SUMMARY:
The galaxy data frame records the radial velocity of a
spiral galaxy measured at 323 points in the area of sky
which it covers. All the measurements lie within seven
slots crossing at the origin. The positions of the meas-
urements given by four variables (columns).
DATA DESCRIPTION:
east.west: the east-west coordinate. The origin, (0,0), is
near the center of the galaxy, east is negative, west is
positive.
north.south: the north-south coordinate. The origin, (0,0), is
near the center of the galaxy, south is negative, north is
positive.
angle: degrees of counter-clockwise rotation from the horizon-
tal of the slot within which the observation lies.
radial.position: signed distance from origin; negative if
east-west coordinate is negative.
velocity: radial velocity measured in km/sec.
SOURCE:
Buta, R. (1987) The Structure and Dynamics of Ringed
Galaxies, III: Surface Photometry and Kinematics of the
Ringed Nonbarred Spiral NGC7531. The Astrophysical J.
Supplement Ser. Vol. 64, pp. 1--37.
John M. Chambers and Trevor J. Hastie, (eds.) Statistical
Models in S, Wadsworth and Brooks, Pacific Grove, CA 1992,
pg. 352.
import pandas as pd
from numpy import *
from numpy.linalg import norm
dat = pd.read_csv("galaxy.data")
x1 = dat.loc[:,"east.west"].as_matrix()
x2 = dat.loc[:, "north.south"].as_matrix()
y = dat.loc[:, "velocity"].as_matrix()
Now, let's write the cost function and the gradient of the cost function.
def f(x, y, theta):
x = vstack( (ones((1, x.shape[1])), x))
return sum( (y - dot(theta.T,x)) ** 2)
def df(x, y, theta):
x = vstack( (ones((1, x.shape[1])), x))
return -2*sum((y-dot(theta.T, x))*x, 1)
The gradient descent algorithm is the same:
def grad_descent(f, df, x, y, init_t, alpha):
EPS = 1e-5 #EPS = 10**(-5)
prev_t = init_t-10*EPS
t = init_t.copy()
max_iter = 30000
iter = 0
while norm(t - prev_t) > EPS and iter < max_iter:
prev_t = t.copy()
t -= alpha*df(x, y, t)
if iter % 500 == 0:
print "Iter", iter
print "x = (%.2f, %.2f, %.2f), f(x) = %.2f" % (t[0], t[1], t[2], f(x, y, t))
print "Gradient: ", df(x, y, t), "\n"
iter += 1
return t
Let's check the gradient to make sure it works:
x = vstack((x1, x2))
theta = array([-3, 2, 1])
h = 0.000001
print (f(x, y, theta+array([0, h, 0])) - f(x, y, theta-array([0, h, 0])))/(2*h)
print df(x, y, theta)
x = vstack((x1, x2))
theta0 = array([0., 0., 0.])
theta = grad_descent(f, df, x, y, theta0, 0.0000010)
There is an exact solution:
x = vstack((x1, x2))
x = vstack( (ones((1, x.shape[1])), x))
dot(dot(linalg.inv(dot(x, x.T)),x), y)