AI/ML — Module 05Intermediate

Dot Product and Similarity

The operation behind every recommendation engine, embedding search, and attention mechanism — built from plain English first, then intuition, then math, then code.

22–28 min March 2026

Module 05 · Math Foundations

Math · 7 modulesModule 05

Vectors,Matrix Dot Eigenvalues Derivatives,Probability Information

Before any formula — what problem does this solve?

You use dot product every day without knowing it. Every time Spotify recommends a song, it is running a dot product.

You open Spotify. It shows you a song you've never heard — and you love it. How did it know? Spotify represents every song as a list of numbers. Not random numbers — each number means something. Something like: how much bass does this song have? How fast is the tempo? Is it energetic or calm? Is it acoustic or electronic?

A song might look like this internally:

How Spotify internally represents a song

Bass intensity

0.9

0.8

0.1

Tempo (fast)

0.7

0.6

0.2

Energy level

0.8

0.9

0.1

Acoustic (not elec)

0.1

0.2

0.9

Vocal prominence

0.5

0.4

0.8

Hip-hop trackSimilar trackClassical piano

Now Spotify knows you like the hip-hop track. It wants to find other songs you might like. The question becomes: which other song is most similar to the hip-hop track? To answer this, it needs a way to measure similarity between two lists of numbers. That measurement is the dot product.

The hip-hop track and the "Similar track" have high numbers in the same places (bass, tempo, energy) and low numbers in the same places (acoustic, classical feel). The classical piano is the opposite — low where the hip-hop is high and high where the hip-hop is low. The dot product gives you a single number that captures exactly this relationship. High number = similar. Low number = different.

🎯 Pro Tip

This is all the dot product is — a way to measure how much two lists of numbers "point in the same direction." If they agree on what is high and what is low, the result is large. If they disagree, the result is small. Everything else in this module is just making this idea precise.

Quick recap — Module 03

A vector is just a list of numbers — each number has a meaning

From Module 03 you know that a vector is an ordered list of numbers. But the key insight people miss is that in ML, position matters. The first number always means the same thing. The second number always means the same thing. This is what makes comparison meaningful — we are always comparing apples to apples.

What makes a vector a vector in ML — not just in math

In ML, a vector is a fixed-length list where every position has a consistent meaning. Position 0 might mean "bass intensity" for every song in the database. Position 1 might mean "tempo" for every song. If you swap the positions, comparison breaks completely.

This is called a feature vector. In ML, you will represent customers, songs, words, images, and loan applications all as feature vectors. The dot product measures similarity between any two of them.

Vectors as directions in space — the geometric picture

Each vector is an arrow from the origin. Vectors pointing in the same direction (small angle between them) are similar. Vectors pointing in different directions (large angle) are different. The dot product measures exactly this angle — indirectly.

The core operation

The dot product — multiply matching positions, then add everything up

Here is the full operation in plain English before any symbols: take two vectors of the same length. Multiply the numbers at each matching position together. Add all those products up. That final sum is the dot product.

🧠 Analogy — read this first

Think of it like a compatibility score between two people on a dating app. Both people rate 5 things on a scale of 1–10: outdoors, movies, cooking, travel, fitness. Person A: [8, 3, 7, 9, 6]. Person B: [7, 2, 8, 10, 5].

To find compatibility: multiply each matching pair (8×7, 3×2, 7×8, 9×10, 6×5) and add them up (56 + 6 + 56 + 90 + 30 = 238). A high number means they care about the same things. This IS the dot product.

The dot product rewards agreement. When both vectors are high at the same position, that position contributes a large product. When one is high and the other is low, the contribution is small.

Dot product step by step — hip-hop vs similar vs classical

Hip-hop [0.9, 0.7, 0.8, 0.1, 0.5] · Similar [0.8, 0.6, 0.9, 0.2, 0.4]

0.9×0.8

0.72

0.7×0.6

0.42

0.8×0.9

0.72

0.1×0.2

0.02

0.5×0.4

0.2

2.08

Result: 2.08 — HIGH — very similar

Hip-hop [0.9, 0.7, 0.8, 0.1, 0.5] · Classical [0.1, 0.2, 0.1, 0.9, 0.8]

0.9×0.1

0.09

0.7×0.2

0.14

0.8×0.1

0.08

0.1×0.9

0.09

0.5×0.8

0.4

0.8

Result: 0.8 — LOW — very different

Now the formula — after you understand the idea

The formula is just a compact way of writing what you saw above. The Greek letter Σ (sigma) means "add everything up." The subscript i means "for each position i."

a · b = Σ aᵢ × bᵢmultiply matching positions, sum them all

= a₁b₁ + a₂b₂ + ... + aₙbₙwritten out for an n-dimensional vector

Result: a single number (scalar). Not a vector. The dot product always collapses two vectors into one number. That number is the similarity score.

python

import numpy as np

# ── Manual dot product — every step visible ───────────────────────────
hiphop   = [0.9, 0.7, 0.8, 0.1, 0.5]   # bass, tempo, energy, acoustic, vocal
similar  = [0.8, 0.6, 0.9, 0.2, 0.4]
classical= [0.1, 0.2, 0.1, 0.9, 0.8]

def dot_product_manual(a, b):
    """
    Step 1: multiply each matching position
    Step 2: sum all the products
    That is the entire dot product operation.
    """
    assert len(a) == len(b), "Vectors must be same length"
    products = [a[i] * b[i] for i in range(len(a))]
    print(f"  Products at each position: {[round(p, 3) for p in products]}")
    total = sum(products)
    return total

print("Dot product — hip-hop vs similar:")
score_similar   = dot_product_manual(hiphop, similar)
print(f"  Result: {score_similar:.3f}\n")

print("Dot product — hip-hop vs classical:")
score_classical = dot_product_manual(hiphop, classical)
print(f"  Result: {score_classical:.3f}\n")

print(f"Conclusion: similar score ({score_similar:.3f}) > classical score ({score_classical:.3f})")
print("→ Spotify recommends 'similar track' over 'classical piano'")

# ── NumPy way — what you use in production ─────────────────────────────
a = np.array(hiphop)
b = np.array(similar)
c = np.array(classical)

print(f"\nNumPy dot product:")
print(f"  np.dot(hiphop, similar):   {np.dot(a, b):.3f}")
print(f"  a @ b (same thing):        {a @ b:.3f}")  # @ is the dot product operator
print(f"  np.dot(hiphop, classical): {np.dot(a, c):.3f}")

A critical limitation — and how cosine similarity fixes it

The raw dot product has a flaw: bigger vectors always win

Imagine two users on DoorDash. User A orders 50 times a month and rates most orders highly — they are an extremely active user with large numbers everywhere in their feature vector. User B orders 5 times a month but has exactly the same taste preferences, just scaled down.

If you compute the raw dot product between User A and a restaurant, it will be much higher than User B and the same restaurant — not because User A is more similar, but simply because User A has bigger numbers. The raw dot product confuses magnitude (how big the numbers are) with direction (what kind of things you like).

🧠 Analogy — read this first

Think of two arrows. Arrow A is 10cm long pointing northeast. Arrow B is 2cm long also pointing northeast. They point in exactly the same direction — they represent the same preferences. But if you measure their dot product, Arrow A wins simply because it is longer.

Cosine similarity fixes this by dividing the dot product by the length of both vectors — it strips away magnitude and measures only direction. Two arrows pointing northeast give cosine similarity = 1.0 regardless of how long they are.

Cosine similarity — direction only, magnitude ignored

Cosine similarity divides the raw dot product by the product of the two vector lengths (called their norms). The result is always between −1 and +1, regardless of how large or small the original numbers were.

cosine_similarity(a, b) = (a · b) / (||a|| × ||b||)

Result = +1.0Identical direction — completely similar (same taste)

Result = 0.0Perpendicular — completely unrelated

Result = −1.0Opposite direction — completely opposite taste

||a|| is the "length" (norm) of vector a: the square root of the sum of all values squared. It is always a positive number.

python

import numpy as np

# ── The magnitude (norm) problem ──────────────────────────────────────
# Two users with same taste but different activity levels
user_active  = np.array([0.9, 0.8, 0.7, 0.1, 0.2]) * 10   # active user — bigger numbers
user_casual  = np.array([0.9, 0.8, 0.7, 0.1, 0.2])          # casual user — same direction!
restaurant   = np.array([0.8, 0.9, 0.6, 0.2, 0.3])

print("Raw dot product (confused by magnitude):")
print(f"  active user  · restaurant: {np.dot(user_active, restaurant):.2f}")
print(f"  casual user  · restaurant: {np.dot(user_casual, restaurant):.2f}")
print("  Active user scores 10× higher — but they have IDENTICAL taste!")

# ── Vector norm — what is the 'length' of a vector? ───────────────────
def vector_norm(v):
    """
    The length (magnitude) of a vector.
    = sqrt(v[0]² + v[1]² + v[2]² + ...)
    Called the L2 norm or Euclidean norm.
    """
    return np.sqrt(np.sum(v ** 2))

print(f"\nVector norms:")
print(f"  ||user_active||: {vector_norm(user_active):.4f}")
print(f"  ||user_casual||: {vector_norm(user_casual):.4f}")
print(f"  Ratio: {vector_norm(user_active) / vector_norm(user_casual):.1f}× — same direction, 10× length")

# ── Cosine similarity — fixes the magnitude problem ────────────────────
def cosine_similarity(a, b):
    """
    cosine_similarity = dot_product(a, b) / (norm(a) * norm(b))

    Step 1: compute the raw dot product
    Step 2: divide by the product of both norms
    Result: always between -1 and +1, magnitude removed
    """
    dot    = np.dot(a, b)
    norm_a = np.linalg.norm(a)   # same as vector_norm(a)
    norm_b = np.linalg.norm(b)
    return dot / (norm_a * norm_b)

print("\nCosine similarity (magnitude-independent):")
print(f"  active user  · restaurant: {cosine_similarity(user_active, restaurant):.4f}")
print(f"  casual user  · restaurant: {cosine_similarity(user_casual, restaurant):.4f}")
print("  Identical! Cosine similarity sees they have the same taste.")

# ── Real example: song similarity ─────────────────────────────────────
hiphop    = np.array([0.9, 0.7, 0.8, 0.1, 0.5])
similar   = np.array([0.8, 0.6, 0.9, 0.2, 0.4])
classical = np.array([0.1, 0.2, 0.1, 0.9, 0.8])

print("\nCosine similarity — song comparison:")
print(f"  hip-hop vs similar:   {cosine_similarity(hiphop, similar):.4f}  ← high, similar genre")
print(f"  hip-hop vs classical: {cosine_similarity(hiphop, classical):.4f}  ← low, different genre")

# ── sklearn version — what you use in production ──────────────────────
from sklearn.metrics.pairwise import cosine_similarity as sk_cosine

# sklearn expects 2D arrays (matrices), not 1D vectors
X = np.array([hiphop, similar, classical])
similarity_matrix = sk_cosine(X)
labels = ['hip-hop', 'similar', 'classical']

print("\nFull similarity matrix (sklearn):")
print(f"{'':12}", end='')
for l in labels: print(f"{l:12}", end='')
print()
for i, row_label in enumerate(labels):
    print(f"{row_label:12}", end='')
    for j in range(len(labels)):
        print(f"{similarity_matrix[i,j]:.4f}      ", end='')
    print()

Why this matters for ML

The dot product appears in three of the most important ML operations

This is not just a mathematical curiosity. The dot product is the computational engine inside neural networks, transformers, and recommendation systems. Once you understand it here, you will recognise it immediately in those later modules.

Neural network forward pass — every single layer

When data passes through a neural network layer, the computation is: output = input · weights + bias. This is a dot product between the input vector and the weight vector. A layer with 512 neurons runs 512 dot products in parallel. Training a neural network is literally adjusting the values in those weight vectors to make the dot products produce useful outputs.

Attention mechanism in Transformers — the key operation

In GPT, BERT, and every modern LLM, the attention mechanism computes Query · Key — a dot product between what a word is "looking for" (Query) and what other words "offer" (Key). The result tells the model how much each word should pay attention to every other word in the sentence. Module 47 covers this in full.

Embedding search — finding similar items at scale

When Amazon searches for products similar to what you clicked, it stores every product as a vector (embedding) in a vector database. Finding the most similar products is finding the vectors with the highest cosine similarity to your query vector. This is called approximate nearest neighbour search and it runs billions of dot products per second. Module 51 (RAG) is built entirely on this.

python

import numpy as np

# ── Dot product in a neural network layer ────────────────────────────
# One neuron: takes an input vector, has a weight vector, computes dot product
# Think of weights as "how much does this neuron care about each input feature"

def neuron_forward(inputs, weights, bias):
    """
    One neuron's computation:
    1. dot product of inputs and weights (how much each input matters)
    2. add bias (shift the output)
    3. activation function applied later (not shown here)
    """
    return np.dot(inputs, weights) + bias

# A delivery time prediction neuron
# It has learned: distance and prep time matter most, traffic matters some
inputs  = np.array([5.2, 8.0, 22.0])   # distance_km, traffic_score, prep_min
weights = np.array([0.4, 0.15, 0.3])   # learned weights
bias    = 8.6

raw_output = neuron_forward(inputs, weights, bias)
print(f"Neuron output (predicted delivery time): {raw_output:.1f} min")

# ── Attention in transformers (simplified preview) ─────────────────────
# Each word has a Query vector (what it's looking for)
# and a Key vector (what it offers to others)
# Attention score = Query · Key

# Sentence: "The quick brown fox"
# Simplified 3D embeddings for illustration
words = ['The', 'quick', 'brown', 'fox']
# Query vectors (what each word is looking for)
queries = np.array([
    [0.1, 0.9, 0.2],   # 'The' — looks for: noun
    [0.8, 0.3, 0.1],   # 'quick' — looks for: what it modifies
    [0.7, 0.4, 0.2],   # 'brown' — looks for: what it modifies
    [0.2, 0.8, 0.9],   # 'fox' — looks for: its adjectives
])
# Key vectors (what each word offers)
keys = np.array([
    [0.1, 0.2, 0.3],   # 'The' — offers: article
    [0.6, 0.5, 0.1],   # 'quick' — offers: adjective
    [0.7, 0.4, 0.1],   # 'brown' — offers: adjective
    [0.3, 0.9, 0.8],   # 'fox' — offers: noun
])

# How much does 'fox' attend to each word?
fox_query = queries[3]
attention_scores = [np.dot(fox_query, keys[i]) for i in range(4)]
print(f"\nAttention scores for 'fox':")
for word, score in zip(words, attention_scores):
    bar = '█' * int(score * 20)
    print(f"  fox → {word:<8}: {bar} {score:.3f}")
# Fox should attend most to 'quick' and 'brown' (its adjectives)

# ── Embedding similarity search ────────────────────────────────────────
# Find most similar products to a query in an embedding space
np.random.seed(42)
product_names = [
    'Nike Running Shoes', 'Adidas Sneakers', 'Formal Oxford Shoes',
    'Flip Flops', 'Sports Socks', 'Running Shorts',
]
# Simulate product embeddings (in real life these come from a trained model)
product_embeddings = np.random.randn(6, 8)

# Query: user just looked at "Nike Running Shoes"
query_embedding = product_embeddings[0]

# Compute cosine similarity to all products
from sklearn.metrics.pairwise import cosine_similarity
sims = cosine_similarity(
    query_embedding.reshape(1, -1),
    product_embeddings
)[0]

print("\nProducts most similar to 'Nike Running Shoes':")
ranked = sorted(zip(product_names, sims), key=lambda x: x[1], reverse=True)
for name, score in ranked:
    marker = ' ← query' if name == 'Nike Running Shoes' else ''
    print(f"  {name:<25}: {score:.4f}{marker}")

The other similarity measure

Euclidean distance — measuring actual physical distance between points

Cosine similarity measures angle — how similar is the direction. Euclidean distance measures a completely different thing: how far apart are two points in space. Think of it as the straight-line distance you would measure with a ruler.

🧠 Analogy — read this first

Two cities on a map: New York (18.97°N, 72.83°E) and Boston (18.52°N, 73.85°E). The Euclidean distance between them as points on the map is roughly √((18.97−18.52)² + (72.83−73.85)²) — just like the Pythagorean theorem applied to their coordinate difference. This is Euclidean distance.

In ML, "distance" works the same way but in higher dimensions — instead of 2 coordinates you might have 512 coordinates (features). The formula is identical, just with more terms under the square root.

Cosine similarity vs Euclidean distance — when to use each

Use cosine similarity when:

✓ Magnitude should not matter (active vs casual user)

✓ Text/document similarity (short vs long docs)

✓ Recommendation systems

✓ Embedding search in vector databases

✓ Transformer attention scores

Use Euclidean distance when:

✓ Actual spatial closeness matters

✓ K-Nearest Neighbours classification

✓ K-Means clustering

✓ Features are on the same scale

✓ Physical measurements (height, weight, distance)

python

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity, euclidean_distances

# ── Euclidean distance from scratch ───────────────────────────────────
def euclidean_distance(a, b):
    """
    Straight-line distance between two points in n-dimensional space.
    = sqrt( sum of (aᵢ - bᵢ)² for all i )
    Pythagorean theorem generalised to n dimensions.
    """
    differences = a - b
    squared     = differences ** 2
    summed      = np.sum(squared)
    return np.sqrt(summed)

# Customer feature vectors (after scaling to 0-1 range)
# [avg_order_value, orders_per_month, avg_rating, distance_preference]
customer_a = np.array([0.8, 0.9, 0.7, 0.3])   # high-value frequent customer
customer_b = np.array([0.7, 0.8, 0.8, 0.4])   # similar to A
customer_c = np.array([0.2, 0.1, 0.6, 0.9])   # very different — low value, rare

print("Euclidean distances:")
print(f"  A to B: {euclidean_distance(customer_a, customer_b):.4f}  ← small, similar customers")
print(f"  A to C: {euclidean_distance(customer_a, customer_c):.4f}  ← large, different customers")

print("\nCosine similarities (same customers):")
print(f"  A to B: {cosine_similarity(customer_a.reshape(1,-1), customer_b.reshape(1,-1))[0,0]:.4f}")
print(f"  A to C: {cosine_similarity(customer_a.reshape(1,-1), customer_c.reshape(1,-1))[0,0]:.4f}")

# ── When they disagree — an important case ─────────────────────────────
# Two customers with exactly the same PATTERN but different AMOUNTS
casual_customer = np.array([0.2, 0.2, 0.6, 0.3])       # same preferences, less active
active_customer = np.array([0.8, 0.8, 0.6, 0.3]) * 4   # 4× more active, same pattern

print("\nCasual vs active customer (same pattern, different magnitude):")
print(f"  Euclidean distance:  {euclidean_distance(casual_customer, active_customer):.4f}")
print(f"  Cosine similarity:   {cosine_similarity(casual_customer.reshape(1,-1), active_customer.reshape(1,-1))[0,0]:.4f}")
print("  Euclidean: they look far apart (because active has larger values)")
print("  Cosine:    they look identical (because they point the same direction)")
print("  → For segmentation by BEHAVIOUR: use cosine")
print("  → For segmentation by ACTIVITY LEVEL: use euclidean")

What this looks like at work

Day-one task at Amazon — build a product similarity engine

Your lead sends you a task: "Users who view a product should see similar products below it. Build a similarity engine for the electronics catalogue." This is exactly the dot product / cosine similarity problem. Here is how you would actually implement it.

python

import numpy as np
from sklearn.preprocessing import normalize
from sklearn.metrics.pairwise import cosine_similarity

# Amazon electronics — simplified feature vectors
# Features: [price_normalised, brand_premium, battery_life, screen_size,
#            camera_quality, performance_score, weight_light]
products = {
    'iPhone 15':         [0.95, 1.0,  0.7,  0.6, 1.0,  0.95, 0.8],
    'Samsung Galaxy S24':[0.85, 0.9,  0.75, 0.65,0.95, 0.9,  0.75],
    'OnePlus 12':        [0.65, 0.7,  0.8,  0.7, 0.85, 0.88, 0.7],
    'Redmi Note 13':     [0.25, 0.4,  0.85, 0.72,0.65, 0.6,  0.65],
    'iPhone SE':         [0.55, 1.0,  0.5,  0.35,0.75, 0.8,  0.9],
    'iPad Air':          [0.75, 1.0,  0.9,  1.0, 0.85, 0.9,  0.5],
    'Samsung Tab S9':    [0.7,  0.85, 0.85, 0.95,0.8,  0.85, 0.45],
    'Boat Earbuds':      [0.1,  0.3,  0.9,  0.0, 0.0,  0.4,  1.0],
}

names       = list(products.keys())
embeddings  = np.array(list(products.values()))

# Normalise so cosine similarity is just a dot product (unit vectors)
# This is the standard approach for any similarity search
embeddings_norm = normalize(embeddings, norm='l2')

# Build the full similarity matrix
sim_matrix = cosine_similarity(embeddings_norm)

def get_similar_products(product_name, top_k=3):
    """Given a product name, return the top_k most similar products."""
    idx   = names.index(product_name)
    sims  = sim_matrix[idx]
    # Sort by similarity, exclude the product itself
    ranked = sorted(
        [(names[i], sims[i]) for i in range(len(names)) if i != idx],
        key=lambda x: x[1], reverse=True,
    )
    return ranked[:top_k]

print("Product similarity recommendations:\n")
for product in ['iPhone 15', 'Redmi Note 13', 'iPad Air', 'Boat Earbuds']:
    print(f"  Customer viewing: {product}")
    for rec, score in get_similar_products(product):
        bar = '█' * int(score * 20)
        print(f"    → {rec:<22}: {bar} {score:.3f}")
    print()

# ── Production note: in real Amazon, embeddings come from a trained
# neural network (not hand-crafted features). The similarity computation
# is identical — just replace the hand-crafted vectors with model embeddings.
print("In production:")
print("  Step 1: train a model to generate product embeddings")
print("  Step 2: store all embeddings in a vector database (FAISS, Pinecone)")
print("  Step 3: at query time: embed the viewed product, find nearest neighbours")
print("  Step 4: return top-k most similar products")

Errors you will hit

Every common dot product and similarity error — explained and fixed

ValueError: shapes (5,) and (4,) not aligned: 5 (dim 0) != 4 (dim 0)

Why it happens

You are trying to take the dot product of two vectors with different lengths. The dot product requires matching positions — if the vectors have different numbers of features, there is no valid pairing. This is the single most common error when building similarity systems.

Fix

Check both vector shapes: print(a.shape, b.shape). They must be identical. If you are building a recommendation system, make sure all items are embedded using the same model with the same output dimension. If one vector comes from a different source or a different model version, they will have incompatible dimensions.

Cosine similarity returns nan (not a number) for some vectors

Why it happens

One or both vectors have all-zero values. Cosine similarity divides by the norm (length) of both vectors. The norm of a zero vector is 0, and dividing by zero gives nan. Zero vectors appear when a user has no interaction history, a new product has no features, or after incorrect normalisation.

Fix

Add a zero-vector check before computing similarity: if np.linalg.norm(v) == 0: return 0.0. For new users or products with no history, return a default similarity score of 0 (unrelated). In recommendation systems this is called the cold-start problem — handle it separately with popularity-based recommendations until enough data is collected.

All similarities are 1.0 — every item looks identical

Why it happens

You normalised the vectors before computing cosine similarity, but then accidentally computed the dot product directly (which equals cosine similarity for unit vectors — good) but also accidentally kept original un-normalised vectors elsewhere. Or: all your feature vectors are identical because a bug copied the same vector repeatedly.

Fix

Add a sanity check: print a sample of 5 similarity scores. They should vary. Print 3–4 actual feature vectors to confirm they are different. Use np.unique(embeddings, axis=0).shape to check how many unique vectors exist. If all similarities are 1.0 and vectors look correct, you may have accidentally fed in one-hot vectors or binary vectors that are all the same pattern.

Similarity scores are unexpectedly low — similar items score below 0.3

Why it happens

Features are on very different scales and you have not normalised the vectors. A feature with values in the thousands (like price in rupees) will dominate the dot product completely, making all other features irrelevant. The direction of the vector is completely distorted by the large-scale feature.

Fix

Always normalise or standardise your feature vectors before computing similarity. Use sklearn's normalize(X, norm='l2') to convert to unit vectors, or StandardScaler() to bring all features to zero mean and unit variance. This is exactly why Module 17 (Feature Scaling) comes before similarity-based algorithms.

What comes next

You can measure similarity. Now you need to understand the structure hidden inside data.

The dot product tells you how similar two vectors are. But what if you want to understand the overall structure of a dataset — which directions explain the most variation? Which features are really independent and which are just reflections of each other? That requires Module 06: Eigenvalues and Eigenvectors — the mathematical foundation of PCA, the most common dimensionality reduction technique in production ML.

Next — Module 06 · Math Foundations

Eigenvalues and Eigenvectors

The mathematical foundation of PCA, spectral clustering, and PageRank. What eigenvectors are, why they matter, and how to compute them.

coming soon

🎯 Key Takeaways

✓The dot product multiplies matching positions in two vectors and sums the results. It gives a single number — high when the vectors agree on what is important, low when they disagree. This is the foundation of similarity measurement in ML.
✓The raw dot product is sensitive to magnitude — bigger vectors always produce higher scores. Two people with identical taste but different activity levels look very different to the raw dot product.
✓Cosine similarity fixes the magnitude problem by dividing the dot product by the lengths of both vectors. The result is always between −1 and +1, measuring direction only. Two vectors pointing the same direction give cosine similarity of +1.0 regardless of length.
✓The dot product is the core computation in three major ML operations: the forward pass in every neural network layer (inputs · weights), the attention mechanism in Transformers (Query · Key), and embedding similarity search in recommendation systems.
✓Euclidean distance measures actual spatial distance between points — good for KNN and K-Means. Cosine similarity measures angle — good for text, embeddings, and recommendation systems where magnitude should not affect the comparison.
✓Always normalise feature vectors before computing cosine similarity. Features on different scales (price in rupees vs distance in km) distort the direction of the vector, making similarity scores meaningless.

Discussion

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub