Dot Product and Similarity
The operation behind every recommendation engine, embedding search, and attention mechanism — built from plain English first, then intuition, then math, then code.
You use dot product every day without knowing it. Every time Spotify recommends a song, it is running a dot product.
You open Spotify. It shows you a song you've never heard — and you love it. How did it know? Spotify represents every song as a list of numbers. Not random numbers — each number means something. Something like: how much bass does this song have? How fast is the tempo? Is it energetic or calm? Is it acoustic or electronic?
A song might look like this internally:
Now Spotify knows you like the hip-hop track. It wants to find other songs you might like. The question becomes: which other song is most similar to the hip-hop track? To answer this, it needs a way to measure similarity between two lists of numbers. That measurement is the dot product.
The hip-hop track and the "Similar track" have high numbers in the same places (bass, tempo, energy) and low numbers in the same places (acoustic, classical feel). The classical piano is the opposite — low where the hip-hop is high and high where the hip-hop is low. The dot product gives you a single number that captures exactly this relationship. High number = similar. Low number = different.
A vector is just a list of numbers — each number has a meaning
From Module 03 you know that a vector is an ordered list of numbers. But the key insight people miss is that in ML, position matters. The first number always means the same thing. The second number always means the same thing. This is what makes comparison meaningful — we are always comparing apples to apples.
In ML, a vector is a fixed-length list where every position has a consistent meaning. Position 0 might mean "bass intensity" for every song in the database. Position 1 might mean "tempo" for every song. If you swap the positions, comparison breaks completely.
This is called a feature vector. In ML, you will represent customers, songs, words, images, and loan applications all as feature vectors. The dot product measures similarity between any two of them.
The dot product — multiply matching positions, then add everything up
Here is the full operation in plain English before any symbols: take two vectors of the same length. Multiply the numbers at each matching position together. Add all those products up. That final sum is the dot product.
Think of it like a compatibility score between two people on a dating app. Both people rate 5 things on a scale of 1–10: outdoors, movies, cooking, travel, fitness. Person A: [8, 3, 7, 9, 6]. Person B: [7, 2, 8, 10, 5].
To find compatibility: multiply each matching pair (8×7, 3×2, 7×8, 9×10, 6×5) and add them up (56 + 6 + 56 + 90 + 30 = 238). A high number means they care about the same things. This IS the dot product.
The dot product rewards agreement. When both vectors are high at the same position, that position contributes a large product. When one is high and the other is low, the contribution is small.
Now the formula — after you understand the idea
The formula is just a compact way of writing what you saw above. The Greek letter Σ (sigma) means "add everything up." The subscript i means "for each position i."
The raw dot product has a flaw: bigger vectors always win
Imagine two users on Swiggy. User A orders 50 times a month and rates most orders highly — they are an extremely active user with large numbers everywhere in their feature vector. User B orders 5 times a month but has exactly the same taste preferences, just scaled down.
If you compute the raw dot product between User A and a restaurant, it will be much higher than User B and the same restaurant — not because User A is more similar, but simply because User A has bigger numbers. The raw dot product confuses magnitude (how big the numbers are) with direction (what kind of things you like).
Think of two arrows. Arrow A is 10cm long pointing northeast. Arrow B is 2cm long also pointing northeast. They point in exactly the same direction — they represent the same preferences. But if you measure their dot product, Arrow A wins simply because it is longer.
Cosine similarity fixes this by dividing the dot product by the length of both vectors — it strips away magnitude and measures only direction. Two arrows pointing northeast give cosine similarity = 1.0 regardless of how long they are.
Cosine similarity — direction only, magnitude ignored
Cosine similarity divides the raw dot product by the product of the two vector lengths (called their norms). The result is always between −1 and +1, regardless of how large or small the original numbers were.
||a|| is the "length" (norm) of vector a: the square root of the sum of all values squared. It is always a positive number.
The dot product appears in three of the most important ML operations
This is not just a mathematical curiosity. The dot product is the computational engine inside neural networks, transformers, and recommendation systems. Once you understand it here, you will recognise it immediately in those later modules.
When data passes through a neural network layer, the computation is: output = input · weights + bias. This is a dot product between the input vector and the weight vector. A layer with 512 neurons runs 512 dot products in parallel. Training a neural network is literally adjusting the values in those weight vectors to make the dot products produce useful outputs.
In GPT, BERT, and every modern LLM, the attention mechanism computes Query · Key — a dot product between what a word is "looking for" (Query) and what other words "offer" (Key). The result tells the model how much each word should pay attention to every other word in the sentence. Module 47 covers this in full.
When Flipkart searches for products similar to what you clicked, it stores every product as a vector (embedding) in a vector database. Finding the most similar products is finding the vectors with the highest cosine similarity to your query vector. This is called approximate nearest neighbour search and it runs billions of dot products per second. Module 51 (RAG) is built entirely on this.
Euclidean distance — measuring actual physical distance between points
Cosine similarity measures angle — how similar is the direction. Euclidean distance measures a completely different thing: how far apart are two points in space. Think of it as the straight-line distance you would measure with a ruler.
Two cities on a map: Mumbai (18.97°N, 72.83°E) and Pune (18.52°N, 73.85°E). The Euclidean distance between them as points on the map is roughly √((18.97−18.52)² + (72.83−73.85)²) — just like the Pythagorean theorem applied to their coordinate difference. This is Euclidean distance.
In ML, "distance" works the same way but in higher dimensions — instead of 2 coordinates you might have 512 coordinates (features). The formula is identical, just with more terms under the square root.
Day-one task at Flipkart — build a product similarity engine
Your lead sends you a task: "Users who view a product should see similar products below it. Build a similarity engine for the electronics catalogue." This is exactly the dot product / cosine similarity problem. Here is how you would actually implement it.
Every common dot product and similarity error — explained and fixed
You can measure similarity. Now you need to understand the structure hidden inside data.
The dot product tells you how similar two vectors are. But what if you want to understand the overall structure of a dataset — which directions explain the most variation? Which features are really independent and which are just reflections of each other? That requires Module 06: Eigenvalues and Eigenvectors — the mathematical foundation of PCA, the most common dimensionality reduction technique in production ML.
The mathematical foundation of PCA, spectral clustering, and PageRank. What eigenvectors are, why they matter, and how to compute them.
🎯 Key Takeaways
- ✓The dot product multiplies matching positions in two vectors and sums the results. It gives a single number — high when the vectors agree on what is important, low when they disagree. This is the foundation of similarity measurement in ML.
- ✓The raw dot product is sensitive to magnitude — bigger vectors always produce higher scores. Two people with identical taste but different activity levels look very different to the raw dot product.
- ✓Cosine similarity fixes the magnitude problem by dividing the dot product by the lengths of both vectors. The result is always between −1 and +1, measuring direction only. Two vectors pointing the same direction give cosine similarity of +1.0 regardless of length.
- ✓The dot product is the core computation in three major ML operations: the forward pass in every neural network layer (inputs · weights), the attention mechanism in Transformers (Query · Key), and embedding similarity search in recommendation systems.
- ✓Euclidean distance measures actual spatial distance between points — good for KNN and K-Means. Cosine similarity measures angle — good for text, embeddings, and recommendation systems where magnitude should not affect the comparison.
- ✓Always normalise feature vectors before computing cosine similarity. Features on different scales (price in rupees vs distance in km) distort the direction of the vector, making similarity scores meaningless.
Discussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.