“The gradient tells you how to step. The Hessian tells you how to curve.”
Let .
If , the Hessian is the matrix whose entry is .
Concept | Why it matters |
---|---|
Jacobian | How a layer/output responds to changes in all its inputs. Used in sensitivity, invertibility, and normalizing flows. |
Hessian | Curvature of loss: used in Newton’s method, second‑order optimization, understanding saddle points. |
Spectrum | Eigenvalues reveal flatness, sharpness, or ill‑conditioning — affecting convergence speed. |
Suppose our “neural network” is:
with , shape , shape .
Let’s compute the batch Jacobian with PyTorch.
# calc-07-jacobian-hessian/mlp_jacobian.py
import torch
torch.set_default_dtype(torch.float64)
def mlp(x, W, b):
return torch.tanh(W @ x + b) # shape [2]
# random weights, biases, batch of points
W = torch.tensor([[1.0, -0.7],
[0.4, 0.9]], requires_grad=True)
b = torch.tensor([0.5, -0.3], requires_grad=True)
xs = [torch.tensor([1.2, -0.8], requires_grad=True),
torch.tensor([0.3, 2.0], requires_grad=True)]
for i, x in enumerate(xs):
y = mlp(x, W, b)
J = torch.zeros((2, 2))
for j in range(2): # output dim
grad = torch.autograd.grad(y[j], x, retain_graph=True, create_graph=True)[0]
J[j] = grad
print(f"\nJacobian at input {i}:")
print(J.detach().numpy())
Let’s make a toy 2-layer MLP with scalar output, and plot the eigenvalues of its Hessian at a random input. (You’ll see how “curved” the loss is in different directions.)
# calc-07-jacobian-hessian/mlp_hessian_spectrum.py
import torch
import numpy as np
torch.set_default_dtype(torch.float64)
def net(x, params):
W1, b1, W2, b2 = params
h = torch.tanh(W1 @ x + b1)
out = W2 @ h + b2
return out
# Small network (2‑2‑1)
W1 = torch.randn(2, 2, requires_grad=True)
b1 = torch.randn(2, requires_grad=True)
W2 = torch.randn(1, 2, requires_grad=True)
b2 = torch.randn(1, requires_grad=True)
params = [W1, b1, W2, b2]
x = torch.randn(2, requires_grad=True)
def flat_params(params):
return torch.cat([p.reshape(-1) for p in params] + [x.reshape(-1)])
y = net(x, params)
all_vars = [W1, b1, W2, b2, x]
y.backward(create_graph=True)
flat = flat_params(params)
n = flat.numel()
hessian = torch.zeros(n, n)
for i in range(n):
grad_i = torch.autograd.grad(y, flat, retain_graph=True, create_graph=True, allow_unused=True)[0][i]
row = torch.autograd.grad(grad_i, flat, retain_graph=True, allow_unused=True)[0]
hessian[i] = row
# Eigen spectrum
eigvals = np.linalg.eigvalsh(hessian.detach().numpy())
print("Hessian eigenvalues:", np.round(eigvals, 5))
import matplotlib.pyplot as plt
plt.stem(eigvals)
plt.title("Hessian eigenvalues (spectrum)")
plt.xlabel("direction")
plt.ylabel("curvature")
plt.tight_layout()
plt.show()
You may see positive, negative, and near‑zero eigenvalues — corresponding to convex, concave, and flat directions (“saddles”).
Put solutions in calc-07-jacobian-hessian/
and tag v0.1
.
Next: Calculus 8 — Chain Rule & Backpropagation: Generalizing Gradients in Neural Nets.