“Calculus doesn’t begin with derivatives or integrals—it begins with asking what happens as we get arbitrarily close to something.”
Training a neural network is nothing more than finding limits—the loss approaches its minimum as parameter updates approach zero. If the loss function were not continuous, gradient‑based methods would fail spectacularly. So before we race to back‑propagation, we need rock‑solid intuition for limits and continuity.
For a function we say
means: for every tolerance you pick around , I can pick a distance around such that whenever , we have .
This ε‑δ definition is the formal backbone of all later calculus. Continuous functions are those whose limits agree with their function values:
At the formula “” looks undefined, yet the limit exists and equals 1. Graphing a zoom makes the idea visceral.
# demo_limits.py
import numpy as np
import matplotlib.pyplot as plt
# 1. sample points around 0
x = np.linspace(-1e-1, 1e-1, 2001)
y = np.where(x != 0, np.sin(x)/x, 1.0) # define f(0)=1 by continuity
# 2. plot
plt.figure(figsize=(5,4))
plt.plot(x, y, label=r'$\,\sin x / x\,$')
plt.scatter([0], [1], color='black', zorder=3) # the limiting point
plt.axhline(1, linestyle='--', linewidth=0.7)
plt.xlabel("x")
plt.ylabel("f(x)")
plt.title("Zoom on sin(x)/x near 0")
plt.tight_layout()
plt.show()
Run it:
python demo_limits.py
You’ll see the curve flatten toward . Try reducing the range to 1e-3
,
1e-4
, … to watch the limit converge numerically.
def limit_tester(f, c, L, eps=1e-4):
# naive sweep for a suitable δ
for delta_exp in range(-1, -8, -1): # 0.1, 0.01, …, 1e‑7
δ = 10.0 ** delta_exp
xs = np.linspace(c - δ, c + δ, 1001)
xs = xs[xs != c]
if np.all(np.abs(f(xs) - L) < eps):
return δ
return None
f = lambda x: np.sin(x)/x
δ_found = limit_tester(f, 0.0, 1.0, eps=1e-5)
print(f"Found δ = {δ_found} for ε = 1e-5")
Use it to prove experimentally that the limit truly is 1.
A function is continuous at if:
For ML, common activations (ReLU, GELU, sigmoid, …) all satisfy these three, which is why their gradients behave nicely.
If is continuous and differentiable, then
Everything we do in back‑prop is disguised limit‑taking! In the next post we’ll zoom from “approaching ” to “slope at .”
torch.autograd.grad
of torch.sin
at several points
and verify numerically with finite differences (hint:
torch.autograd.functional.jacobian
).Commit your notebooks to calc-01-limits/
and tag it v0.1
.
Up next: Calculus 2 – Derivatives & Gradient Descent From Scratch — see you there!