K-Nearest Neighbours
The simplest possible ML algorithm — predict based on what your neighbours look like. Distance metrics, the curse of dimensionality, and when KNN actually works.
KNN has no training phase. No weights. No boundary. Just one idea: similar inputs should produce similar outputs.
A new customer joins Flipkart. They are 24 years old, live in Bangalore, buy mostly electronics, and spend ₹3,000 per order on average. What products should you recommend to them?
The simplest possible answer: find the 5 existing customers who are most similar to this new customer. Look at what they bought and liked. Recommend those. You do not need a trained model, learned weights, or a decision boundary. You just need a way to measure similarity and enough historical data to look up neighbours from.
That is the entire KNN algorithm. For a new query point, find the k training points nearest to it (by some distance measure), and predict based on what those neighbours say. For classification: majority vote among the k neighbours. For regression: average of the k neighbours' values. No training. No parameters learned from data. Every prediction looks up the training set from scratch.
You move to a new city and want to know if a neighbourhood is safe. You do not build a statistical model of crime rates. You ask the 5 people who live closest to that neighbourhood what they think. If 4 out of 5 say it is safe, you conclude it is safe. That is KNN — ask your nearest neighbours and take a vote.
The key questions are: how do you measure "closeness"? How many neighbours k should you ask? And what happens when the neighbourhood is crowded in some dimensions but empty in others — the curse of dimensionality. This module answers all three.
How KNN makes a prediction — four steps, no hidden magic
KNN's prediction process is completely transparent. Every step can be inspected and understood. There are no learned parameters — the algorithm memorises the entire training set and consults it at prediction time.
Distance metrics — which one to use and why it matters
KNN's entire behaviour depends on how you measure distance. The same dataset can produce completely different predictions depending on which distance metric you choose. The default — Euclidean distance — works well in most cases. But understanding the alternatives lets you make better choices for specific problem types.
Choosing k — the bias-variance trade-off made visual
The value of k directly controls how much the model generalises versus memorises. Small k means the prediction is based on very few neighbours — highly sensitive to noise. Large k means the prediction is based on many neighbours — smoother but potentially missing local structure. This is the classic bias-variance trade-off, and KNN makes it unusually visible.
The curse of dimensionality — KNN's fundamental limitation
KNN works beautifully in 2 or 3 dimensions. It falls apart in 100 dimensions. The reason is counterintuitive but important — in high-dimensional space, distances lose their meaning. All points become approximately equidistant from each other. When every point is roughly the same distance from every other point, "nearest neighbours" is a meaningless concept.
Imagine searching for the nearest person to you on a street (1D). Easy — look left and right. Now on a field (2D). Harder, but doable. Now in a building (3D). Now in a 100-dimensional space where each dimension is one feature of a customer profile.
In 100 dimensions, the "nearest" customer might actually be almost as far away as the furthest customer. The ratio of nearest-to-furthest distance approaches 1 as dimensions grow. All distances become equally large and equally meaningless.
In d dimensions, to maintain the same neighbourhood density you need exponentially more data. To have 10 neighbours within a distance of 0.1 in 1D, you might need ~100 points. In 10D, you need ~10 billion points for the same density. In practice, your data becomes infinitely sparse.
KNN for classification — majority vote with probabilities
KNN classification works identically to regression — find k nearest neighbours and aggregate. For classification the aggregation is a majority vote: whichever class appears most among the k neighbours wins. Probabilities come naturally — the fraction of neighbours belonging to each class.
When KNN wins — and when it does not
KNN is not a general-purpose algorithm in modern ML. It is slow at prediction time, sensitive to irrelevant features, and breaks down in high dimensions. But it has genuine use cases where it outperforms more complex algorithms.
Speed up KNN for production: approximate nearest neighbours
Every common KNN error — explained and fixed
KNN asks its neighbours. Naive Bayes asks Bayes' theorem.
KNN is a distance-based algorithm — it makes no assumptions about the distribution of data, it just measures proximity. Naive Bayes is the opposite — it is a probabilistic algorithm that makes explicit assumptions about how features are distributed, then uses Bayes' theorem to compute the probability of each class. Despite the "naive" independence assumption that is almost always wrong, it works surprisingly well for text classification and spam detection.
Bayes theorem applied to classification. Why the naive independence assumption works surprisingly well for spam and document classification.
🎯 Key Takeaways
- ✓KNN has no training phase — "training" means storing the data. All computation happens at prediction time: find k nearest training points, aggregate their labels. This makes training instant but prediction slow on large datasets.
- ✓KNN is one of the most scaling-sensitive algorithms in all of sklearn. Unscaled features completely break distance calculations. Always put StandardScaler in a Pipeline before any KNN model — this is the single most impactful thing you can do.
- ✓k controls the bias-variance trade-off directly. k=1 memorises every training point (zero bias, maximum variance, 100% training accuracy). k=n predicts the global average (maximum bias, zero variance). Start with k=sqrt(n_train) and tune with cross-validation.
- ✓Use weights="distance" instead of the default weights="uniform". Closer neighbours are more informative than distant ones — distance weighting consistently improves KNN performance at almost no cost.
- ✓The curse of dimensionality is KNN's fundamental limit. In high dimensions all points become approximately equidistant, making nearest neighbours meaningless. For datasets with more than ~20 features, apply PCA to reduce dimensions before KNN.
- ✓KNN genuinely wins at recommendation (collaborative filtering), anomaly detection (far from all neighbours = anomaly), and low-dimensional non-linear problems. For large datasets, tabular ML, or high-dimensional data, XGBoost or Random Forest will almost always outperform it.
Discussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.