Linear Algebra for Machine Learning, Part 17: Tensors and Higher-Order Generalizations

2025-06-04 · Artintellica

Welcome back to our series on linear algebra for machine learning! In this post, we’re diving into Tensors and Higher-Order Generalizations, extending the concepts of vectors and matrices to multi-dimensional arrays that power deep learning, natural language processing (NLP), and computer vision. Tensors are the fundamental data structures in frameworks like PyTorch, enabling efficient computation through vectorization and broadcasting. Whether you’re processing images, text embeddings, or time series data, understanding tensors is essential. Let’s explore the math, intuition, and implementation with Python code using PyTorch, visualizations, and hands-on exercises.

What Are Tensors and Higher-Order Generalizations?

A tensor is a multi-dimensional array that generalizes scalars (0D), vectors (1D), and matrices (2D) to higher dimensions. In machine learning, tensors represent data and model parameters in a unified way. For instance:

A 3D tensor might represent an RGB image with dimensions (height, width, channels).
A 4D tensor might represent a batch of images with dimensions (batch_size, height, width, channels).
A 5D tensor could represent video data with dimensions (batch_size, time, height, width, channels).

Mathematically, a tensor is an object in a multi-linear space, but in practice, we often think of it as a container for numerical data with multiple axes. Operations on tensors—like addition, multiplication, and reshaping—extend the linear algebra operations we’ve covered for vectors and matrices.

Key Tensor Concepts

Shape and Rank: The shape of a tensor defines its dimensions (e.g., (3, 4, 5) for a 3D tensor), and the rank is the number of dimensions (e.g., rank 3).
Broadcasting: Broadcasting allows operations between tensors of different shapes by automatically expanding smaller tensors along missing dimensions, enabling vectorized computation without explicit loops.
Element-wise Operations: Operations like addition or multiplication can be applied element-wise across tensors of compatible shapes.
Tensor Contraction: Generalizes matrix multiplication to higher dimensions, often used in deep learning for operations like convolution.

Tensors are the backbone of data representation in deep learning frameworks, where efficiency is achieved through vectorized operations on multi-dimensional arrays.

Why Do Tensors Matter in Machine Learning?

Tensors are indispensable in machine learning for several reasons:

Data Representation: They provide a flexible way to represent complex data structures like images (3D/4D tensors), text embeddings (2D/3D tensors), and sequential data (3D tensors).
Efficient Computation: Tensor operations are optimized for GPUs, enabling fast, parallel processing in deep learning.
Model Parameters: Neural network weights and biases are stored as tensors, with shapes reflecting layer architectures.
Generalization: Tensors extend linear algebra to handle higher-dimensional problems in NLP (e.g., word embeddings), computer vision (e.g., convolutional filters), and beyond.

Understanding tensor operations, shapes, and broadcasting is critical for designing and debugging modern ML models.

Working with Tensors in PyTorch

Let’s explore tensor creation, manipulation, and operations using PyTorch. We’ll cover shape tricks, broadcasting, and visualization of tensor operations for intuition.

Example 1: Tensor Creation and Basic Operations

import torch
import numpy as np

# Set random seed for reproducibility
torch.manual_seed(42)

# Create tensors of different ranks and shapes
scalar = torch.tensor(5.0)  # 0D tensor (scalar)
vector = torch.tensor([1.0, 2.0])  # 1D tensor (vector), adjusted to length 2 to match matrix column
matrix = torch.tensor([[1.0, 2.0], [3.0, 4.0]])  # 2D tensor (matrix)
tensor_3d = torch.randn(2, 3, 4)  # 3D tensor (random values)

print("Scalar (0D):", scalar, "Shape:", scalar.shape)
print("Vector (1D):", vector, "Shape:", vector.shape)
print("Matrix (2D):", matrix, "Shape:", matrix.shape)
print("3D Tensor:", tensor_3d, "Shape:", tensor_3d.shape)

# Basic operations
sum_tensor = vector + 2.0  # Element-wise addition with scalar
product_tensor = vector * matrix[:, 0]  # Element-wise multiplication (now compatible shapes)
matmul_result = torch.matmul(matrix, matrix)  # Matrix multiplication

print("\nElement-wise addition (vector + scalar):", sum_tensor)
print("Element-wise multiplication (vector * matrix column):", product_tensor)
print("Matrix multiplication (matrix @ matrix):", matmul_result)

Output (abbreviated):

Scalar (0D): tensor(5.) Shape: torch.Size([])
Vector (1D): tensor([1., 2., 3.]) Shape: torch.Size([3])
Matrix (2D): tensor([[1., 2.],
        [3., 4.]]) Shape: torch.Size([2, 2])
3D Tensor: tensor([[[ 1.9269,  1.4873,  0.9007, -2.1055],
         [ 0.6784, -1.2345, -0.0431, -1.6047],
         [-0.7521,  1.6487, -0.3925, -1.4036]],

        [[-0.7279, -0.5593, -0.7688,  0.7624],
         [ 1.6423, -0.1596, -0.4974,  0.4396],
         [-0.7581,  1.0783,  0.8008,  1.6806]]]) Shape: torch.Size([2, 3, 4])

Element-wise addition (vector + scalar): tensor([3., 4., 5.])
Element-wise multiplication (vector * matrix column): tensor([1., 6., 9.])
Matrix multiplication (matrix @ matrix): tensor([[ 7., 10.],
        [15., 22.]])

This example demonstrates creating tensors of different ranks (0D to 3D) in PyTorch and performing basic operations like element-wise addition, multiplication, and matrix multiplication. The shapes are printed to show the dimensions of each tensor.

Example 2: Broadcasting and Shape Tricks

# Broadcasting example
tensor_a = torch.tensor([1.0, 2.0, 3.0])  # Shape: (3,)
tensor_b = torch.tensor([[1.0], [2.0]])  # Shape: (2, 1)
result_broadcast = tensor_a + tensor_b  # Broadcasting: (2, 3)

print("\nBroadcasting result (tensor_a + tensor_b):", result_broadcast)
print("Result shape:", result_broadcast.shape)

# Shape tricks: Reshaping and unsqueezing
tensor_c = torch.randn(4, 3)  # Shape: (4, 3)
tensor_reshaped = tensor_c.reshape(2, 6)  # Reshape to (2, 6)
tensor_unsqueezed = tensor_c.unsqueeze(0)  # Add dimension at index 0, Shape: (1, 4, 3)

print("\nOriginal tensor shape:", tensor_c.shape)
print("Reshaped tensor shape:", tensor_reshaped.shape)
print("Unsqueezed tensor shape:", tensor_unsqueezed.shape)

Output (abbreviated):

Broadcasting result (tensor_a + tensor_b): tensor([[2., 3., 4.],
        [3., 4., 5.]])
Result shape: torch.Size([2, 3])

Original tensor shape: torch.Size([4, 3])
Reshaped tensor shape: torch.Size([2, 6])
Unsqueezed tensor shape: torch.Size([1, 4, 3])

This example illustrates broadcasting, where tensors of different shapes are combined by automatically expanding dimensions, and shape manipulation tricks like reshaping and unsqueezing to adjust tensor dimensions for compatibility in operations.

Exercises

Here are six exercises to deepen your understanding of tensors and higher-order generalizations. Each exercise requires writing Python code to explore concepts and applications in machine learning using PyTorch.

Tensor Creation and Shapes: Create tensors of ranks 0 through 4 using PyTorch with different shapes (e.g., scalar, vector of size 5, matrix of size 3x4, etc.). Print their shapes and number of elements (numel()) to confirm the dimensions.
Broadcasting Operations: Create two tensors of shapes (3, 1) and (1, 4), perform addition and multiplication using broadcasting, and print the resulting shapes and values. Explain in a comment how broadcasting expanded the dimensions.
Reshaping for Compatibility: Create a 3D tensor of shape (2, 3, 4) with random values. Reshape it into a 2D tensor of shape (6, 4) and a 1D tensor of size 24. Print the shapes after each operation to verify the transformations.
Batch Matrix Multiplication: Create two 3D tensors representing batches of matrices with shapes (5, 3, 2) and (5, 2, 4) (batch_size, rows, cols). Use torch.bmm to perform batch matrix multiplication and print the resulting shape. Explain the operation in a comment.
Image Tensor Manipulation: Create a synthetic 4D tensor representing a batch of RGB images with shape (2, 3, 64, 64) (batch_size, channels, height, width). Transpose the dimensions to (2, 64, 64, 3) using permute and print the shapes before and after to confirm the change.
Tensor Operations in a Neural Network: Build a simple neural network in PyTorch for a 3-feature input dataset (100 samples) with a hidden layer of size 10. Print the shapes of input, weight tensors, and output after each layer during a forward pass to observe how tensor shapes transform through the network.

Conclusion

Tensors and Higher-Order Generalizations extend the power of linear algebra to multi-dimensional data, forming the foundation of deep learning, NLP, and computer vision. By mastering tensor creation, broadcasting, and shape manipulation in PyTorch, we’ve seen how to handle complex data structures efficiently. These concepts are critical for scaling machine learning models to real-world problems involving images, text, and beyond.

In the next post, we’ll explore Spectral Methods in ML (Graph Laplacians, etc.), diving into how linear algebra powers graph-based algorithms and clustering techniques. Stay tuned, and happy learning!