“A derivative is a microscopic zoom on how a function changes; in machine learning we zoom millions of times per second.”
For a real‑valued function , the derivative at is
Geometrically it’s the slope of the tangent line. In optimization, that slope tells us which way to tweak parameters to shrink loss.
Every back‑prop step is the chain rule applied repeatedly through a network’s layers.
sin x
# calc-02-derivatives/compare_finite_vs_autograd.py
import torch, math, matplotlib.pyplot as plt
torch.set_default_dtype(torch.float64)
# sample many x for a smooth curve
x = torch.linspace(-math.pi, math.pi, 400, requires_grad=True)
y = torch.sin(x)
# autograd derivative (should be cos(x))
grads_auto, = torch.autograd.grad(y, x, grad_outputs=torch.ones_like(y))
# finite difference derivative
h = 1e-4
with torch.no_grad():
y_plus = torch.sin(x + h)
y_minus = torch.sin(x - h)
grads_fd = (y_plus - y_minus) / (2*h)
# ground truth
grads_true = torch.cos(x)
# --- Plot ----------------------------------------------------------
plt.figure(figsize=(6,4))
plt.plot(x, grads_true, label="cos(x) (true)", linewidth=2)
plt.plot(x, grads_auto, "--", label="autograd", linewidth=1)
plt.plot(x, grads_fd, ":", label="finite diff", linewidth=1)
plt.legend()
plt.xlabel("x")
plt.ylabel("derivative")
plt.title("sin'(x) via three methods")
plt.tight_layout()
plt.show()
You’ll see all three curves overlap; discrepancies appear only near machine precision.
A slope field is a vector plot of . Even in 1‑D we can render arrows indicating slope direction.
# calc-02-derivatives/slope_field.py
import numpy as np, matplotlib.pyplot as plt
def loss(x): # toy non‑convex loss
return 0.3*np.sin(3*x) + 0.5*(x**2)
xs = np.linspace(-3, 3, 41)
ys = loss(xs)
grads = np.gradient(ys, xs) # finite diff for visualization
plt.figure(figsize=(6,3))
plt.plot(xs, ys, color="black")
plt.quiver(xs, ys, np.ones_like(grads), grads, angles='xy',
scale_units='xy', scale=10, width=0.004, alpha=0.7)
plt.title("Loss curve with slope arrows (direction of steepest ascent)")
plt.xlabel("x"); plt.ylabel("L(x)")
plt.tight_layout(); plt.show()
Flip the sign of the arrows to picture gradient descent steps.
For , the gradient is
torch.autograd.grad
generalizes seamlessly; PyTorch stores in each
tensor’s .grad
field during backward()
.
h
in Demo ① (1e‑2
… 1e‑8
) and plot max
error vs h
. Explain floating‑point round‑off vs. truncation error.plt.quiver
on a grid to visualize the vector field.Push your code to calc-02-derivatives/
and tag v0.1
.
Up next: Calculus 3 – Fundamental Theorem & Numerical Integration — see you soon!