K-Means Clustering — Customer Segmentation
Finding hidden groups in data without labels. Inertia, elbow method, silhouette scores, and when clustering is and is not the right approach.
Every algorithm so far required labels. K-Means does not. It finds hidden groups in your data that you never told it to look for.
Flipkart has 300 million registered customers. Nobody has manually labelled them as "budget buyer", "premium shopper", "deal hunter", or "occasional visitor." Those labels do not exist anywhere in the database. But the patterns that define those groups are absolutely there — in the purchase history, order frequency, average spend, and browsing behaviour.
Every algorithm we have covered so far was supervised — you provided the correct labels during training and the algorithm learned to reproduce them. K-Means is unsupervised — you provide only the features, no labels, and the algorithm discovers structure on its own. The "structure" it finds is groups of customers who are more similar to each other than to customers in other groups.
Once you have those groups, you can give them meaningful names. The group with high spend and high frequency becomes "premium". The group with many browsing sessions but few purchases becomes "window shoppers". Now every customer has a segment — a label — without anyone ever manually labelling a single customer.
A new teacher receives 30 students on the first day with no prior information. She watches them for a week. Without anyone telling her, she notices three natural groups forming: students who always sit up front and answer questions, students who work quietly at the back, and students who are social and work in groups. She did not impose these categories — she discovered them from the students' natural behaviour.
That is clustering. No labels given. Groups discovered from the data itself. K-Means is the most widely used algorithm for doing this at scale.
How K-Means works — four steps, repeated until convergence
K-Means is one of the simplest ML algorithms to understand. There are no weights to learn, no gradients to compute. Just four steps repeated until nothing changes.
Inertia — the objective K-Means minimises
K-Means minimises inertia — the sum of squared distances from each point to its assigned centroid. Low inertia means points are close to their cluster centres — tight, compact clusters. High inertia means clusters are loose and spread out.
The problem: inertia always decreases as k increases. With k = n (one cluster per point), inertia is exactly zero — every point is its own centroid. But k = n is a useless clustering. You need a way to find the right k — where adding more clusters stops meaningfully improving the structure. That is the elbow method.
Silhouette score — measures cluster quality without labels
The elbow method is visual and subjective — different people see the elbow at different places. The silhouette score is a quantitative metric that measures clustering quality for each individual point and averages them.
For each point, the silhouette score measures: how close is it to points in its own cluster (a) versus how close is it to points in the nearest other cluster (b)? A score near +1 means the point is well inside its cluster and far from all others — perfect assignment. A score near 0 means the point is on the boundary between two clusters. A score near −1 means the point was probably assigned to the wrong cluster.
Flipkart customer segmentation — end to end
Customer segmentation is the most common application of K-Means in Indian e-commerce. The output is not just cluster numbers — it is actionable customer groups that the marketing, product, and growth teams can use. Budget buyers get different promotions than premium buyers. Churning customers get re-engagement campaigns. Window shoppers get conversion-focused nudges.
Three situations where K-Means gives wrong answers
K-Means is simple and fast but has three hard limitations. Knowing them saves you from deploying a clustering that looks plausible but is mathematically wrong for your data.
K-Means assumes clusters are round blobs of equal size. It fails completely on ring-shaped clusters (Module 26 showed this for Spectral Clustering), elongated ellipses, or interleaved crescents. K-Means draws Voronoi boundaries (straight lines equidistant between centroids) — these cannot capture curved cluster shapes.
K-Means tends to split large clusters into multiple pieces while merging small adjacent clusters. The centroid of a very large cluster may be pulled toward dense regions, causing boundary misassignment for points at the edges.
The centroid is the mean — outliers pull it toward themselves. One transaction worth ₹50 lakh in a dataset of ₹500 average transactions will pull the "high-value" centroid toward it, making the cluster definition unstable and unrepresentative.
Day-one task — build Swiggy restaurant delivery zones
Swiggy wants to cluster restaurant locations into delivery zones so each delivery partner is assigned to a compact geographic area. This is a geographic K-Means problem — the features are latitude and longitude. Each cluster becomes one delivery zone.
Every common K-Means error — explained and fixed
K-Means groups data into clusters. PCA compresses data into fewer dimensions.
K-Means answers: which group does this point belong to? PCA (Principal Component Analysis) answers a different question: can I represent this data with fewer features while preserving most of the information? You have a customer with 50 features — PCA finds the 5 most informative directions in that 50-dimensional space and projects the customer onto them. The result: 5 numbers instead of 50, capturing 95% of the original information. PCA and K-Means are often used together — PCA first to reduce dimensions, K-Means after to find groups in the reduced space.
Turn 100 features into 10 without losing most of the information. Explained variance, scree plots, and when PCA helps and when it hurts.
🎯 Key Takeaways
- ✓K-Means is unsupervised — no labels required. It discovers hidden groups by iterating: assign each point to the nearest centroid, move centroids to the mean of their assigned points, repeat until convergence. The objective minimised is inertia — sum of squared distances to cluster centres.
- ✓Always scale features before K-Means. It is distance-based — an unscaled feature with large values completely dominates the clustering. StandardScaler before KMeans is not optional.
- ✓You must choose k in advance. Use the elbow method (plot inertia vs k, find the bend) combined with silhouette score (ranges from -1 to +1, higher is better) to choose k. There is no single correct answer — use domain knowledge to validate.
- ✓Silhouette score measures how well each point fits its cluster vs the nearest other cluster. A mean score above 0.5 indicates good clustering. Scores below 0.2 suggest overlapping or poorly separated clusters.
- ✓K-Means fails on non-spherical clusters (rings, crescents), clusters of very different sizes, and data with significant outliers. Alternatives: DBSCAN for arbitrary shapes, Gaussian Mixture Models for different sizes, K-Medoids for outlier robustness.
- ✓For datasets above 100,000 rows, use MiniBatchKMeans instead of KMeans. It processes random mini-batches rather than the full dataset at each iteration — typically 10× faster with nearly identical clustering quality.
Discussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.