AI/ML — Module 03Intermediate

Vectors, Matrices and Tensors

The language every ML algorithm speaks. From a single number to multi-dimensional arrays — visual intuition first, formula second.

30–38 min March 2026

Module 03 · Math Foundations

Math · 7 modulesModule 03

Vectors,Matrix Dot Eigenvalues Derivatives,Probability Information

Why you need this

Every ML algorithm is secretly just matrix math.

When you call model.fit(X_train, y_train) in scikit-learn, what happens inside? The algorithm multiplies matrices, adds vectors, and adjusts numbers. When a neural network processes an image, it converts the pixels into a matrix and multiplies it through dozens of layers. When an LLM reads your prompt, it converts every word into a vector of 768 or 4096 numbers.

You don't need to do this math by hand. NumPy and PyTorch do it for you. But if you don't understand what a matrix multiplication produces and why it's useful, you'll hit errors you can't debug, make architecture decisions you can't explain, and hit a ceiling you can't break through.

This page teaches three things: what scalars, vectors, matrices, and tensors are (with pictures), what the key operations do and why ML needs them, and how to work with them in NumPy — the library all of ML is built on.

🎯 Pro Tip

Read each visual before reading the explanation. The picture does most of the work. The words just confirm what you already understood.

The data hierarchy

From one number to billions — four levels

There are four levels of data structure in ML. Each one is just the previous one extended into another dimension. Once you understand this hierarchy, tensors stop being scary.

The four levels — visualised

Scalar

0D · one number

Vector

1D · list of numbers

102

314

021

Matrix

2D · grid of numbers

1023

Tensor

3D+ · stacked matrices

Scalar — just one number

A scalar is a single value. No direction, no structure. In ML, scalars show up as: a loss value (0.34), a learning rate (0.001), a single prediction (37.8 minutes), a model accuracy (0.92).

python

import numpy as np

# A scalar is just a regular Python number
loss = 0.34
learning_rate = 0.001
accuracy = 0.92

# Or a 0-dimensional NumPy array
scalar = np.array(7)
print(scalar.shape)   # ()  ← empty shape, no dimensions
print(scalar.ndim)    # 0

Vector — a list of numbers with meaning

A vector is a 1D array of numbers. In ML, every single data point is a vector. A DoorDash order with 4 features — distance, time of day, restaurant prep time, traffic score — is a vector of 4 numbers: [3.2, 2, 15, 7].

A word in an LLM is also a vector — called an embedding. The word "king" might be represented as a vector of 768 numbers. Those numbers encode meaning — words with similar meanings have vectors that point in similar directions in space.

One DoorDash order as a vector

3.2

distance_km

time_of_day

prep_time

traffic

order = np.array([3.2, 2, 15, 7]) ← shape: (4,)

python

# One order = one vector
order = np.array([3.2, 2, 15, 7])
print(order.shape)   # (4,)  ← 4 features, 1 dimension
print(order.ndim)    # 1
print(order[0])      # 3.2   ← first feature (distance)

# Vector arithmetic — works element-by-element
order_a = np.array([3.2, 2, 15, 7])
order_b = np.array([1.1, 3, 10, 5])

diff = order_a - order_b
print(diff)   # [2.1, -1, 5, 2]  ← difference for each feature

# Scalar × vector — every element multiplied by the scalar
scaled = 2 * order_a
print(scaled)  # [6.4, 4, 30, 14]

Matrix — your entire dataset in one object

A matrix is a 2D array — rows and columns. In ML, your training data is a matrix. Each row is one example (one order). Each column is one feature (distance, time, etc.). 1,000 orders with 4 features → a matrix with shape (1000, 4).

1000 DoorDash orders → a (1000, 4) matrix

distance_km

time_of_day

prep_time

traffic

row 0

3.2

row 1

1.1

row 2

5.8

row 999

2.4

X.shape = (1000, 4) ← 1000 rows × 4 columns

python

# Build the feature matrix
np.random.seed(42)
n = 1000

X = np.column_stack([
    np.random.uniform(0.5, 8.0, n),   # distance_km
    np.random.randint(1, 5, n),        # time_of_day
    np.random.uniform(5, 25, n),       # prep_time
    np.random.uniform(1, 10, n),       # traffic
])

print(X.shape)    # (1000, 4)  ← rows × columns
print(X.ndim)     # 2

# Indexing
print(X[0])       # first order  → [3.74, 1, 21.4, 3.7]
print(X[0, 0])    # first order, first feature → 3.74
print(X[:, 0])    # ALL orders, first feature (distance column) → shape (1000,)
print(X[:5, :])   # first 5 orders, all features → shape (5, 4)

# Shape tells you: (rows, columns)
rows, cols = X.shape
print(f"{rows} orders, {cols} features each")

Tensor — matrices stacked into higher dimensions

A tensor is the general term for arrays with any number of dimensions. A scalar is a 0D tensor. A vector is a 1D tensor. A matrix is a 2D tensor. Beyond 2D, people usually just say "tensor."

The most common 3D tensor in ML is a batch of images. A single colour image is a 2D grid of pixels — but it has 3 colour channels (Red, Green, Blue). So one image is a 3D tensor: height × width × channels. A batch of 32 images adds another dimension: (32, 224, 224, 3).

Image as a tensor — shape (height, width, channels)

224×224

→

image.shape = (224, 224, 3)

height × width × colour channels

batch.shape = (32, 224, 224, 3)

32 images in one training batch

python

# 3D tensor — one RGB image (224×224 pixels, 3 colour channels)
image = np.random.randint(0, 256, size=(224, 224, 3))
print(image.shape)   # (224, 224, 3)
print(image.ndim)    # 3

# 4D tensor — batch of 32 images
batch = np.random.randint(0, 256, size=(32, 224, 224, 3))
print(batch.shape)   # (32, 224, 224, 3)

# The shape tells the whole story:
# (batch_size, height, width, channels)
#  32          224     224    3

# In PyTorch, channels come SECOND: (batch, channels, height, width)
# batch_pytorch = np.random.randn(32, 3, 224, 224)
# This is a common source of shape errors when switching between libraries.

The operations

Four operations ML uses constantly

You don't need all of linear algebra. ML uses the same four operations over and over. Understanding these four deeply is enough to follow any ML paper, debug any shape error, and understand how data flows through a neural network.

1. Dot product — the similarity measure

The dot product takes two vectors of the same length and returns a single number. Multiply the matching elements, then add everything up. That number measures how much the two vectors "point in the same direction."

This is exactly how a neural network neuron works — it takes all your input features, multiplies each one by a learned weight, and adds everything up. One dot product = one neuron's output.

Dot product — step by step

features

weights

0.5

0.3

0.8

0.2

result

5.7

3×0.5 = 1.5
1×0.3 = 0.3
4×0.8 = 3.2
2×0.2 = 0.4
──────────
sum = 5.4

python

features = np.array([3, 1, 4, 2])
weights  = np.array([0.5, 0.3, 0.8, 0.2])

# Method 1: np.dot()
result = np.dot(features, weights)
print(result)   # 5.4

# Method 2: @ operator (preferred in modern code)
result = features @ weights
print(result)   # 5.4

# This is literally what one neuron does:
# multiply each input by its weight, sum everything up
# then add a bias term: output = dot(features, weights) + bias

2. Matrix multiplication — the core of neural networks

When you multiply two matrices, you're doing many dot products at once. Each row of the first matrix dots with each column of the second. This is how a neural network layer transforms all your training examples simultaneously in one shot.

The shape rule is the most important thing to memorise here: to multiply matrix A (shape m×n) by matrix B (shape n×p), the inner dimensions must match (both n), and the result has shape (m×p).

Shape rule for matrix multiplication

(1000 × 4)

X — training data

(4 × 64)

W — weight matrix

(1000 × 64)

output

Inner dimensions must match (both 4). Result takes the outer dimensions (1000 × 64). In plain English: 1000 orders each processed by 64 neurons = 1000 outputs of size 64.

python

# 1000 training examples, 4 features each
X = np.random.randn(1000, 4)   # shape (1000, 4)

# Weight matrix for a layer with 64 neurons
W = np.random.randn(4, 64)     # shape (4, 64)
b = np.random.randn(64)        # bias for each neuron

# One matrix multiply = entire layer's output for ALL examples at once
output = X @ W + b
print(output.shape)   # (1000, 64)

# Shape rule check: (1000, 4) @ (4, 64) → inner dims match → (1000, 64) ✓

# The most common error in all of deep learning:
# X.shape = (1000, 4), W.shape = (64, 4)  ← WRONG ORDER
# output = X @ W  → ValueError: matmul dimensions don't match
# Fix: W = W.T   ← transpose W to (4, 64) then it works

3. Transpose — flip rows and columns

The transpose of a matrix flips it diagonally — rows become columns, columns become rows. Shape (m×n) becomes (n×m). You'll use this constantly to fix shape errors.

python

A = np.array([[1, 2, 3],
              [4, 5, 6]])
print(A.shape)    # (2, 3)

AT = A.T           # or np.transpose(A)
print(AT.shape)   # (3, 2)

print(AT)
# [[1, 4],
#  [2, 5],
#  [3, 6]]

# Real use: fixing shape mismatches
# If model expects (batch, features) but you have (features, batch):
X_fixed = X.T    # flip it

4. Broadcasting — apply an operation without repeating yourself

Broadcasting lets NumPy apply an operation between arrays of different shapes without creating copies. The smaller array is conceptually "stretched" to match the larger one.

In ML, you use this every time you normalise a dataset — you subtract the mean and divide by the standard deviation across 1,000 rows using just one line.

python

# Normalise 1000 orders: subtract mean, divide by std
X = np.random.randn(1000, 4)   # 1000 orders, 4 features

mean = X.mean(axis=0)   # shape (4,)  ← one mean per feature column
std  = X.std(axis=0)    # shape (4,)

# Broadcasting: (1000,4) - (4,) works automatically
# NumPy treats (4,) as if it were (1,4) then tiles it 1000 times
X_norm = (X - mean) / std
print(X_norm.shape)        # (1000, 4)  ← same shape, now normalised
print(X_norm.mean(axis=0)) # [~0, ~0, ~0, ~0] ← means now near zero
print(X_norm.std(axis=0))  # [~1, ~1, ~1, ~1] ← stds now near one

# Without broadcasting you'd need:
# mean_matrix = np.tile(mean, (1000, 1))  ← wasteful copy

⚠️ Important

Broadcasting is powerful but causes silent bugs when shapes are accidentally compatible. Always print shapes when something looks wrong — print(array.shape) is the most useful debugging line in all of NumPy.

Shapes in practice

The shape cheatsheet — what every array means

In ML, shapes carry meaning. Once you know the convention, reading a shape like (32, 3, 224, 224) immediately tells you: 32 images in the batch, 3 colour channels, 224×224 pixels. Here's the full reference.

ShapeWhat it representsReal example

()Scalar — single numberloss = 0.34

(n,)1D vector — n valuesone_order = (4,)

(m, n)2D matrix — m rows, n colsX_train = (800, 4)

(batch, n)Batch of vectorsX_batch = (32, 128)

(H, W, C)Image — height × width × channelsimage = (224, 224, 3)

(B, H, W, C)Batch of imagesbatch = (32, 224, 224, 3)

(seq, d)Sequence of token embeddingssentence = (128, 768)

(B, seq, d)Batch of token sequencesbatch = (32, 128, 768)

The most important debugging habit: whenever you get a shape error or unexpected result, immediately print the shape of every array involved. 90% of ML bugs come down to a wrong shape somewhere in the pipeline.

What comes next

You can now read ML code without getting lost.

Every time you see an ML model being built — a layer in PyTorch, a fit call in sklearn, an attention mechanism in a Transformer — you now understand the underlying structure. Data is stored in arrays. Arrays have shapes. Layers multiply matrices. Outputs are also matrices.

The next module takes this one step further — matrix multiplication and linear transformations. You'll see exactly how data changes shape as it flows through a neural network layer, and why the choice of matrix dimensions determines what a layer can learn.

Next — Module 04

Matrix Multiplication and Linear Transformations

How a neural network layer transforms your data — visualised step by step.

coming soon

🎯 Key Takeaways

✓Scalar = one number. Vector = list of numbers. Matrix = 2D grid. Tensor = any N-dimensional array. They are nested — each is the previous extended by one dimension.
✓In ML: one data point = one vector. Your entire dataset = a matrix. A batch of images = a 4D tensor (batch, height, width, channels).
✓Dot product: multiply matching elements and sum — this is exactly what one neuron does. np.dot(features, weights) or features @ weights.
✓Matrix multiply shape rule: (m × n) @ (n × p) → (m × p). Inner dimensions must match. If they don't, you have a shape error — transpose one of the matrices.
✓Broadcasting lets you subtract a mean vector of shape (4,) from a matrix of shape (1000, 4) without a loop. NumPy stretches the smaller array automatically.
✓The one debugging habit: print(array.shape) immediately when something is wrong. 90% of ML bugs are shape mismatches.

Discussion

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub