What is Machine Learning?
Not the Wikipedia definition. The actual idea — what it means, how it works, and why it changed everything.
It's 2015. You're a new engineer at Swiggy.
Orders are coming in faster than anyone expected. Customers open the app, see a restaurant they want, and before they place the order they ask the same question: how long will this take?
Your job is to show a delivery time estimate. You sit down and start writing rules.
if distance < 2km:
estimated_time = 20
elif distance < 5km:
estimated_time = 30
else:
estimated_time = 45
if current_hour in rush_hours:
estimated_time += 10
if is_raining:
estimated_time += 8
if restaurant == "popular_restaurant":
estimated_time += 5
# Ship it.You ship it. The results are terrible. A 1.5 km delivery from a slow kitchen during peak hours takes 55 minutes. The same route on a Tuesday afternoon takes 14. Your rules are off by 20 minutes on a third of all orders. Users are complaining. The product team is not happy.
The problem is not that you wrote bad rules. The problem is that the real relationship between inputs and delivery time involves dozens of interacting variables — kitchen load, rider availability, traffic by street segment, weather severity, order complexity, time since last order from that restaurant — and the combinations are too complex for any human to enumerate.
What Machine Learning actually means
In 1959, Arthur Samuel defined machine learning as: "the field of study that gives computers the ability to learn without being explicitly programmed."
That definition is technically accurate and practically useless. "Without being explicitly programmed" tells you almost nothing about how it works or what you actually do as a practitioner.
But what does "learning" actually mean mechanically? It means this loop, run millions of times:
The model takes an input and produces a guess. First prediction: random or near-zero.
Compare the guess to the actual answer. Quantify how wrong it was. This is the loss function.
Change the model's internal numbers slightly in the direction that reduces the error. This is gradient descent.
Do this for every example in your training data. Then do it again. Thousands of times. The model converges.
The 3 types of Machine Learning
Not all ML problems look the same. The type of data you have — specifically whether you have labelled outputs or not — determines which category of ML you are working in. There are three.
You provide labelled training examples — each input is paired with the correct output. The model learns the mapping from inputs to outputs by seeing thousands of these pairs.
Teaching a child to identify animals by showing them 1,000 photos, each labelled with the animal's name. The child learns from your labels.
You have data but no labels — no correct answers to learn from. The model looks for patterns, groupings, or structure that exists in the data on its own terms.
A librarian given 10,000 books with no categories. They group them by content similarity — biography, fiction, technical — without being told what the categories should be.
An agent takes actions in an environment, receives a reward or penalty after each action, and learns over time which sequence of actions maximises total reward. No labelled data — just feedback from consequences.
Teaching a dog to fetch. You do not explain fetching. You give treats when the dog picks up the ball and brings it back. The dog learns the behaviour through trial, error, and rewards.
The ML workflow — start to finish
Every ML project at every company — from a two-person startup to Flipkart's 400-person data team — follows the same seven steps. The tools change. The algorithms change. The steps do not.
Before touching data, be precise: what are you predicting? What inputs will you have at prediction time? What does "good enough" look like in numbers?
Pull your historical data and look at it. What are the distributions? Are there missing values? Outliers? Surprising correlations? You cannot build a good model on data you do not understand.
Handle missing values. Encode categorical variables. Scale numerical features. Split into training and test sets. The model will only be as good as the data you feed it.
Pick an algorithm appropriate for your problem type and data. Feed it your training data. The algorithm adjusts its internal parameters until it fits the training patterns.
Run your trained model on the 20% of data it has never seen. Measure performance metrics. This is your honest estimate of how it will behave in production.
Add more or better features. Try a more powerful algorithm. Tune hyperparameters. Each iteration goes back to the training data — the test set must stay untouched until you think you are done.
Wrap your model in an API. Serve predictions in production. Monitor performance over time — data distributions shift, and a model that was accurate in January may degrade by July.
Terms you will see on every ML page — defined once, clearly
ML has jargon. There is no avoiding it. But the jargon is not complicated — it is just precise language for specific ideas. Learn these 12 terms here and you will never need to pause on any later page.
An input variable used to make a prediction. One column in your data table. Also called a predictor or independent variable.
The thing you are trying to predict. The correct answer in your training data. Also called the dependent variable or output.
A mathematical function that maps input features to a predicted output. After training, it is a set of numbers (parameters) that encode the learned patterns.
The labelled examples you feed to the algorithm during learning. The model sees these inputs and their correct outputs.
Held-out labelled examples the model never sees during training. Used only to evaluate final performance. Must not influence any training decision.
The internal numbers of a model that are adjusted during training. They are what the model "learns." A linear regression has one weight per feature.
A number measuring how wrong the model's predictions are. Training aims to minimise this. Different problems use different loss functions.
The model memorises the training data so well that it fails on new data. It learned noise instead of signal. Performs great on training set, poorly on test set.
The model is too simple to capture the real patterns. Performs poorly on both training and test data. Usually means the model or features need more complexity.
Settings you choose before training that control how the model learns — not learned from data. Tuning these is an optimisation problem of its own.
Using a trained model to make predictions on new data. Also called prediction or scoring. Inference is what happens in production.
The simplest possible benchmark — often just predicting the mean. Your model must beat this to be worth deploying. The bar you need to clear.
What ML engineers actually do at Indian companies
Machine Learning is not a single job title. Three roles work with ML in different ways. Understanding the differences will help you decide which path you are on.
You're ready for the first algorithm
You now have the foundation. You know what Machine Learning is, how it differs from traditional programming, what the three types are, what the seven-step workflow looks like, and what the key vocabulary means.
The next page introduces the simplest possible supervised learning algorithm — Linear Regression — and uses it to build an actual delivery time predictor for Swiggy. You will see every concept from this page in working code.
🎯 Key Takeaways
- ✓ML = examples in, rules out. You provide labelled data; the algorithm finds the patterns and encodes them as a model.
- ✓Training = predict → measure error → adjust → repeat. The model iterates over the training data, nudging its parameters toward lower loss on each pass.
- ✓Three types: Supervised (labelled data, most common), Unsupervised (no labels, find structure), Reinforcement (learn from rewards). This section covers Supervised.
- ✓Every ML project follows the same 7-step workflow: define problem → collect data → prepare data → train → evaluate → improve → deploy.
- ✓Overfitting means memorising training data (good train score, bad test score). Underfitting means too simple (bad both). Both are diagnosable and fixable.
- ✓12 key vocabulary terms — feature, label, model, training/test data, parameters, loss, overfitting, underfitting, hyperparameter, inference, baseline — are defined and will not be re-explained.
Discussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.