Welcome! In this installment of our RL-with-PyTorch series, we dive into the heart of learning: loss functions. A model “learns” by minimizing a loss—a quantitative measure of how wrong its predictions are. Understanding loss functions and their surfaces is your first step toward training not just neural networks, but also policies and value functions in RL.
This post will help you:
Let’s turn error into learning!
At its core, a loss function quantifies the difference between a prediction and the true value . Its minimization guides training/optimization.
For regression or continuous outputs:
For binary classification (, ):
The shape of the loss surface determines how easily a model can be optimized!
import torch
import torch.nn.functional as F
y_true = torch.tensor([2.0, 3.5, 5.0])
y_pred = torch.tensor([2.5, 2.8, 4.6])
# Manual MSE
mse_manual = ((y_true - y_pred) ** 2).mean()
print("Manual MSE:", mse_manual.item())
# PyTorch MSE
mse_builtin = F.mse_loss(y_pred, y_true)
print("PyTorch MSE:", mse_builtin.item())
Let’s visualize for a fixed dataset and see how loss changes as weight varies.
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1.0, 2.0, 3.0])
y = np.array([2.0, 4.0, 6.1]) # Linear with small noise
w_vals = np.linspace(0, 3, 100)
loss_vals = [np.mean((w * x - y)**2) for w in w_vals]
plt.plot(w_vals, loss_vals)
plt.xlabel("w")
plt.ylabel("MSE Loss")
plt.title("Loss Surface for Linear Model y = w*x")
plt.grid(True)
plt.show()
Notice the bowl shape—this is characteristic of quadratic losses.
Let’s plot the BCE loss as a function of predicted probability for both possible labels .
p = np.linspace(1e-6, 1 - 1e-6, 200)
bce_y1 = -np.log(p) # when y = 1
bce_y0 = -np.log(1 - p) # when y = 0
plt.plot(p, bce_y1, label='y=1')
plt.plot(p, bce_y0, label='y=0')
plt.xlabel('Predicted probability ($\hat{y}$)')
plt.ylabel('BCE Loss')
plt.title('Binary Cross Entropy as a Function of $\hat{y}$')
plt.ylim(0, 6)
plt.legend()
plt.grid(True)
plt.show()
Let’s compare the effect of MSE and MAE (mean absolute error) on outliers.
from matplotlib.ticker import MaxNLocator
y_true = torch.tensor([1.0, 1.0, 1.0, 1.0, 10.0])
errs = np.linspace(-5, 10, 200) # Error for the last (potential outlier) element
mse_vals = [F.mse_loss(torch.tensor([*y_true[:-1], y_true[-1] + e]), y_true).item() for e in errs]
mae_vals = [F.l1_loss (torch.tensor([*y_true[:-1], y_true[-1] + e]), y_true).item() for e in errs]
plt.plot(errs, mse_vals, label="MSE")
plt.plot(errs, mae_vals, label="MAE")
plt.xlabel("Outlier error")
plt.ylabel("Loss")
plt.title("Losses vs Outlier Error")
plt.legend()
plt.grid(True)
plt.show()
torch.nn.functional.mse_loss
.import torch
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
# EXERCISE 1
y_true = torch.tensor([1.5, 3.0, 4.0])
y_pred = torch.tensor([2.0, 2.5, 3.5])
mse_manual = ((y_true - y_pred) ** 2).mean()
mse_torch = F.mse_loss(y_pred, y_true)
print("Manual MSE:", mse_manual.item())
print("PyTorch MSE:", mse_torch.item())
# EXERCISE 2
x = np.array([0, 1, 2, 3])
y = np.array([1, 2, 2, 4])
ws = np.linspace(-1, 3, 100)
loss_curve = [np.mean((w*x - y)**2) for w in ws]
plt.plot(ws, loss_curve)
plt.xlabel("w"); plt.ylabel("MSE Loss")
plt.title("Loss Surface for Linear Model"); plt.grid(True); plt.show()
# EXERCISE 3
p = np.linspace(0.01, 0.99, 100)
bce_0 = -np.log(1 - p)
bce_1 = -np.log(p)
plt.plot(p, bce_0, label="y=0")
plt.plot(p, bce_1, label="y=1")
plt.xlabel("Predicted probability ($\hat{y}$)")
plt.ylabel("BCE Loss")
plt.legend(); plt.grid(True); plt.title("BCE Loss as Function of Prediction"); plt.show()
# EXERCISE 4
y_true = torch.tensor([1, 1, 1, 10], dtype=torch.float32)
x_pred = np.linspace(0, 20, 120)
mse_vals = [F.mse_loss(torch.tensor([1, 1, 1, val]), y_true).item() for val in x_pred]
mae_vals = [F.l1_loss(torch.tensor([1, 1, 1, val]), y_true).item() for val in x_pred]
plt.plot(x_pred, mse_vals, label="MSE")
plt.plot(x_pred, mae_vals, label="MAE")
plt.xlabel("Predicted outlier value")
plt.ylabel("Loss")
plt.legend(); plt.grid(True)
plt.title("Effect of Outlier on MSE vs MAE"); plt.show()
Loss functions shape how models learn and what they prioritize. You’ve now:
Next: You’ll fit your first linear regression model and visualize how loss minimization leads to learning from data. Get ready to move from error measurement to model training!
See you in Part 2.4!