Vectors, Matrices and Tensors
The language every ML algorithm speaks. From a single number to multi-dimensional arrays — visual intuition first, formula second.
Every ML algorithm is secretly just matrix math.
When you call model.fit(X_train, y_train) in scikit-learn, what happens inside? The algorithm multiplies matrices, adds vectors, and adjusts numbers. When a neural network processes an image, it converts the pixels into a matrix and multiplies it through dozens of layers. When an LLM reads your prompt, it converts every word into a vector of 768 or 4096 numbers.
You don't need to do this math by hand. NumPy and PyTorch do it for you. But if you don't understand what a matrix multiplication produces and why it's useful, you'll hit errors you can't debug, make architecture decisions you can't explain, and hit a ceiling you can't break through.
This page teaches three things: what scalars, vectors, matrices, and tensors are (with pictures), what the key operations do and why ML needs them, and how to work with them in NumPy — the library all of ML is built on.
From one number to billions — four levels
There are four levels of data structure in ML. Each one is just the previous one extended into another dimension. Once you understand this hierarchy, tensors stop being scary.
Scalar — just one number
A scalar is a single value. No direction, no structure. In ML, scalars show up as: a loss value (0.34), a learning rate (0.001), a single prediction (37.8 minutes), a model accuracy (0.92).
Vector — a list of numbers with meaning
A vector is a 1D array of numbers. In ML, every single data point is a vector. A Swiggy order with 4 features — distance, time of day, restaurant prep time, traffic score — is a vector of 4 numbers: [3.2, 2, 15, 7].
A word in an LLM is also a vector — called an embedding. The word "king" might be represented as a vector of 768 numbers. Those numbers encode meaning — words with similar meanings have vectors that point in similar directions in space.
Matrix — your entire dataset in one object
A matrix is a 2D array — rows and columns. In ML, your training data is a matrix. Each row is one example (one order). Each column is one feature (distance, time, etc.). 1,000 orders with 4 features → a matrix with shape (1000, 4).
Tensor — matrices stacked into higher dimensions
A tensor is the general term for arrays with any number of dimensions. A scalar is a 0D tensor. A vector is a 1D tensor. A matrix is a 2D tensor. Beyond 2D, people usually just say "tensor."
The most common 3D tensor in ML is a batch of images. A single colour image is a 2D grid of pixels — but it has 3 colour channels (Red, Green, Blue). So one image is a 3D tensor: height × width × channels. A batch of 32 images adds another dimension: (32, 224, 224, 3).
Four operations ML uses constantly
You don't need all of linear algebra. ML uses the same four operations over and over. Understanding these four deeply is enough to follow any ML paper, debug any shape error, and understand how data flows through a neural network.
1. Dot product — the similarity measure
The dot product takes two vectors of the same length and returns a single number. Multiply the matching elements, then add everything up. That number measures how much the two vectors "point in the same direction."
This is exactly how a neural network neuron works — it takes all your input features, multiplies each one by a learned weight, and adds everything up. One dot product = one neuron's output.
2. Matrix multiplication — the core of neural networks
When you multiply two matrices, you're doing many dot products at once. Each row of the first matrix dots with each column of the second. This is how a neural network layer transforms all your training examples simultaneously in one shot.
The shape rule is the most important thing to memorise here: to multiply matrix A (shape m×n) by matrix B (shape n×p), the inner dimensions must match (both n), and the result has shape (m×p).
3. Transpose — flip rows and columns
The transpose of a matrix flips it diagonally — rows become columns, columns become rows. Shape (m×n) becomes (n×m). You'll use this constantly to fix shape errors.
4. Broadcasting — apply an operation without repeating yourself
Broadcasting lets NumPy apply an operation between arrays of different shapes without creating copies. The smaller array is conceptually "stretched" to match the larger one.
In ML, you use this every time you normalise a dataset — you subtract the mean and divide by the standard deviation across 1,000 rows using just one line.
The shape cheatsheet — what every array means
In ML, shapes carry meaning. Once you know the convention, reading a shape like (32, 3, 224, 224) immediately tells you: 32 images in the batch, 3 colour channels, 224×224 pixels. Here's the full reference.
The most important debugging habit: whenever you get a shape error or unexpected result, immediately print the shape of every array involved. 90% of ML bugs come down to a wrong shape somewhere in the pipeline.
You can now read ML code without getting lost.
Every time you see an ML model being built — a layer in PyTorch, a fit call in sklearn, an attention mechanism in a Transformer — you now understand the underlying structure. Data is stored in arrays. Arrays have shapes. Layers multiply matrices. Outputs are also matrices.
The next module takes this one step further — matrix multiplication and linear transformations. You'll see exactly how data changes shape as it flows through a neural network layer, and why the choice of matrix dimensions determines what a layer can learn.
How a neural network layer transforms your data — visualised step by step.
🎯 Key Takeaways
- ✓Scalar = one number. Vector = list of numbers. Matrix = 2D grid. Tensor = any N-dimensional array. They are nested — each is the previous extended by one dimension.
- ✓In ML: one data point = one vector. Your entire dataset = a matrix. A batch of images = a 4D tensor (batch, height, width, channels).
- ✓Dot product: multiply matching elements and sum — this is exactly what one neuron does. np.dot(features, weights) or features @ weights.
- ✓Matrix multiply shape rule: (m × n) @ (n × p) → (m × p). Inner dimensions must match. If they don't, you have a shape error — transpose one of the matrices.
- ✓Broadcasting lets you subtract a mean vector of shape (4,) from a matrix of shape (1000, 4) without a loop. NumPy stretches the smaller array automatically.
- ✓The one debugging habit: print(array.shape) immediately when something is wrong. 90% of ML bugs are shape mismatches.
Discussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.