AI/ML — Module 21Intermediate

What is Machine Learning?

Not the Wikipedia definition. The actual idea — what it means, how it works, and why it changed everything.

18–22 min March 2026

Classical ML · 13 modulesModule 21

What Linear Logistic Decision Support K-Nearest Naive Random Gradient XGBoost LightGBM K-Means Principal

The problem that started everything

It's 2015. You're a new engineer at DoorDash.

Orders are coming in faster than anyone expected. Customers open the app, see a restaurant they want, and before they place the order they ask the same question: how long will this take?

Your job is to show a delivery time estimate. You sit down and start writing rules.

if distance < 2km:
    estimated_time = 20
elif distance < 5km:
    estimated_time = 30
else:
    estimated_time = 45

if current_hour in rush_hours:
    estimated_time += 10

if is_raining:
    estimated_time += 8

if restaurant == "popular_restaurant":
    estimated_time += 5

# Ship it.

You ship it. The results are terrible. A 1.5 km delivery from a slow kitchen during peak hours takes 55 minutes. The same route on a Tuesday afternoon takes 14. Your rules are off by 20 minutes on a third of all orders. Users are complaining. The product team is not happy.

The problem is not that you wrote bad rules. The problem is that the real relationship between inputs and delivery time involves dozens of interacting variables — kitchen load, rider availability, traffic by street segment, weather severity, order complexity, time since last order from that restaurant — and the combinations are too complex for any human to enumerate.

Machine Learning is the answer to this problem. Instead of writing rules, you take the last 500,000 completed orders — each one a record of what the inputs were and what the actual delivery time turned out to be — and you feed them to a learning algorithm. The algorithm finds the patterns. It discovers that kitchen prep time is 40% of variance. That 6–8 PM Friday adds 12 minutes on average. That rain below 5mm matters less than rain above 20mm. You never wrote those rules. The data wrote them for you.

🎯 Pro Tip

This DoorDash delivery time problem is the running example for the entire Classical ML section. Every algorithm — Linear Regression, Decision Trees, XGBoost — will be explained using this same scenario. By the end of the section you will have built a complete delivery time predictor from scratch.

The actual definition

What Machine Learning actually means

In 1959, Arthur Samuel defined machine learning as: "the field of study that gives computers the ability to learn without being explicitly programmed."

That definition is technically accurate and practically useless. "Without being explicitly programmed" tells you almost nothing about how it works or what you actually do as a practitioner.

The real meaning: in traditional programming, you write the logic and the computer follows it. You are the one who figures out the rules. In Machine Learning, you provide the examples — inputs paired with correct outputs — and the computer writes the logic. The algorithm figures out the rules. Your job shifts from writing rules to curating data.

But what does "learning" actually mean mechanically? It means this loop, run millions of times:

Predict

The model takes an input and produces a guess. First prediction: random or near-zero.

Measure error

Compare the guess to the actual answer. Quantify how wrong it was. This is the loss function.

Adjust

Change the model's internal numbers slightly in the direction that reduces the error. This is gradient descent.

Repeat

Do this for every example in your training data. Then do it again. Thousands of times. The model converges.

💡 Note

Gradient descent is explained in full in the Linear Regression topic. For now just hold the mental model: the model makes guesses, measures how wrong they are, and nudges its numbers in the direction that makes the next guess less wrong. Repeat until it stops getting better.

The landscape

The 3 types of Machine Learning

Not all ML problems look the same. The type of data you have — specifically whether you have labelled outputs or not — determines which category of ML you are working in. There are three.

Supervised Learning

You have the answers. The model learns from them.

You provide labelled training examples — each input is paired with the correct output. The model learns the mapping from inputs to outputs by seeing thousands of these pairs.

Analogy

Teaching a child to identify animals by showing them 1,000 photos, each labelled with the animal's name. The child learns from your labels.

Real examples

DoorDash delivery time prediction — input: order details, label: actual delivery time (regression)

Stripe fraud detection — input: transaction features, label: fraud / not fraud (classification)

Gmail spam filter — input: email text + metadata, label: spam / not spam (classification)

HDFC loan approval — input: applicant financials, label: approved / rejected (classification)

Unsupervised Learning

No labels. Find the hidden structure yourself.

You have data but no labels — no correct answers to learn from. The model looks for patterns, groupings, or structure that exists in the data on its own terms.

Analogy

A librarian given 10,000 books with no categories. They group them by content similarity — biography, fiction, technical — without being told what the categories should be.

Real examples

Amazon customer segmentation — group 300M users by behaviour without predefined segments

Anomaly detection in payment networks — find unusual patterns without labelling what fraud looks like

Product catalogue clustering — group similar products without human-defined category trees

User journey analysis — discover common navigation paths without labelling intent

Reinforcement Learning

Learn by trying. Get rewarded for good decisions.

An agent takes actions in an environment, receives a reward or penalty after each action, and learns over time which sequence of actions maximises total reward. No labelled data — just feedback from consequences.

Analogy

Teaching a dog to fetch. You do not explain fetching. You give treats when the dog picks up the ball and brings it back. The dog learns the behaviour through trial, error, and rewards.

Real examples

Google DeepMind cooling data centres — RL agent reduced cooling energy by 40%

Instacart delivery route optimisation — agent learns which routes minimise time across all riders simultaneously

Algorithmic trading — agent learns when to buy and sell by receiving profit/loss as reward signal

AlphaGo — agent learned to play Go by playing millions of games against itself

💡 Note

This section — Classical ML — focuses entirely on Supervised Learning. It is the most common type in production, the foundation for everything else, and what you will encounter most in your first few years as an ML practitioner. Unsupervised and Reinforcement Learning are covered later in the track.

How it actually works

The ML workflow — start to finish

Every ML project at every company — from a two-person startup to Amazon's 400-person data team — follows the same seven steps. The tools change. The algorithms change. The steps do not.

Define the problem

Before touching data, be precise: what are you predicting? What inputs will you have at prediction time? What does "good enough" look like in numbers?

SWIGGYPredicting: delivery_time_min. Inputs available at order time: distance, restaurant_id, time_of_day, day_of_week, weather_code, rider_count_nearby. Good enough: mean absolute error ≤ 5 minutes on 85% of orders.

Collect and understand your data

Pull your historical data and look at it. What are the distributions? Are there missing values? Outliers? Surprising correlations? You cannot build a good model on data you do not understand.

SWIGGYPull 12 months of completed orders: 500,000 rows. Find: 2% have missing restaurant_prep_time. Outliers: 0.3% with delivery_time > 120 min (likely cancelled/reordered). Correlation check: distance is strong but not dominant — prep time is equally predictive.

Prepare the data

Handle missing values. Encode categorical variables. Scale numerical features. Split into training and test sets. The model will only be as good as the data you feed it.

SWIGGYFill missing prep times with restaurant median. Encode time_of_day as 4 buckets (morning/lunch/afternoon/evening). Scale distance to 0–1 range. Split: 80% training (400K orders), 20% test (100K orders, never touched during training).

Choose and train a model

Pick an algorithm appropriate for your problem type and data. Feed it your training data. The algorithm adjusts its internal parameters until it fits the training patterns.

SWIGGYStart simple: Linear Regression. Feed 400K training orders. Training takes under 1 second. The model learns coefficients for each feature — distance contributes +8.3 min/km, rush hour adds 9.7 min, and so on.

Evaluate on the test set

Run your trained model on the 20% of data it has never seen. Measure performance metrics. This is your honest estimate of how it will behave in production.

SWIGGYRun on 100K test orders. Mean Absolute Error: 4.2 minutes. 79% of predictions within ±5 minutes. Not quite the 85% target. Time to improve.

Improve and iterate

Add more or better features. Try a more powerful algorithm. Tune hyperparameters. Each iteration goes back to the training data — the test set must stay untouched until you think you are done.

SWIGGYSwitch to XGBoost. Add 3 new features: restaurant_avg_prep_last_7d, rider_avg_speed_last_hour, order_item_count. MAE drops to 2.8 minutes. 91% within ±5 minutes. Target exceeded.

Deploy and monitor

Wrap your model in an API. Serve predictions in production. Monitor performance over time — data distributions shift, and a model that was accurate in January may degrade by July.

SWIGGY3 million predictions per day. Real-time MAE monitoring dashboard. Alert triggers if 1-hour rolling MAE exceeds 5 minutes. Automated weekly retraining on the latest 30 days of data.

This workflow is the backbone of this entire track. Steps 1–3 are what the Data Engineering section covers in depth. Step 4 is every algorithm in this Classical ML section. Step 5 is the Evaluation & Optimisation section. Steps 6–7 are Hyperparameter Tuning and MLOps. Every section of this track maps to a step in this workflow.

The vocabulary

Terms you will see on every ML page — defined once, clearly

ML has jargon. There is no avoiding it. But the jargon is not complicated — it is just precise language for specific ideas. Learn these 12 terms here and you will never need to pause on any later page.

Feature

An input variable used to make a prediction. One column in your data table. Also called a predictor or independent variable.

distance_km, time_of_day, restaurant_id, weather_code — each is one feature in the delivery time model.

Label / Target

The thing you are trying to predict. The correct answer in your training data. Also called the dependent variable or output.

delivery_time_min — the actual number of minutes each order took, recorded after delivery.

Model

A mathematical function that maps input features to a predicted output. After training, it is a set of numbers (parameters) that encode the learned patterns.

The trained delivery time predictor. Given features for a new order, it outputs a number like 28.4 minutes.

Training data

The labelled examples you feed to the algorithm during learning. The model sees these inputs and their correct outputs.

400,000 historical DoorDash orders with their actual delivery times — the 80% split used to train the model.

Test data

Held-out labelled examples the model never sees during training. Used only to evaluate final performance. Must not influence any training decision.

100,000 historical orders kept aside. Run through the trained model after training is complete to get an honest performance estimate.

Parameters / Weights

The internal numbers of a model that are adjusted during training. They are what the model "learns." A linear regression has one weight per feature.

The coefficient +8.3 (min/km) on distance, +9.7 (min) for rush hour — these are learned parameters.

Loss / Error

A number measuring how wrong the model's predictions are. Training aims to minimise this. Different problems use different loss functions.

Mean Absolute Error = average of |predicted_time − actual_time| across all predictions. Lower is better.

Overfitting

The model memorises the training data so well that it fails on new data. It learned noise instead of signal. Performs great on training set, poorly on test set.

A model that learns that one specific restaurant always takes 47 minutes because that was true in training data — but it's a coincidence, not a pattern.

Underfitting

The model is too simple to capture the real patterns. Performs poorly on both training and test data. Usually means the model or features need more complexity.

A model that always predicts 28 minutes regardless of inputs. It learned the average but nothing else.

Hyperparameter

Settings you choose before training that control how the model learns — not learned from data. Tuning these is an optimisation problem of its own.

In XGBoost: max_depth (how deep each tree grows), learning_rate (how fast parameters update), n_estimators (how many trees to build).

Inference

Using a trained model to make predictions on new data. Also called prediction or scoring. Inference is what happens in production.

A new order comes in at 7:43 PM on a Friday, 3.2 km away. The trained model runs inference and outputs 34.1 minutes.

Baseline

The simplest possible benchmark — often just predicting the mean. Your model must beat this to be worth deploying. The bar you need to clear.

Baseline: always predict 31 minutes (the training set mean). MAE = 8.3 min. If your model can't beat 8.3 MAE, it has learned nothing useful.

What this looks like at work

What ML engineers actually do at Indian companies

Machine Learning is not a single job title. Three roles work with ML in different ways. Understanding the differences will help you decide which path you are on.

ML Engineer

Build and ship models into production

Write training pipelines that run on a schedule

Build and maintain the feature engineering code

Wrap models in FastAPI services, deploy to Kubernetes

Monitor prediction quality and trigger retraining

Debug why a model that worked in dev fails in prod

₹18–28 LPA

Data Scientist

Find insights and answer business questions with data

Explore data to find patterns and test hypotheses

Build models to answer specific business questions

Run A/B experiments and interpret results statistically

Communicate findings to non-technical stakeholders

Prototype quickly; hand production code to ML engineers

₹16–24 LPA

Applied Scientist

Research and apply advanced techniques at scale

Read and implement current ML research papers

Design novel model architectures for company-specific problems

Run large-scale offline experiments before production decisions

Collaborate with ML engineers on production deployment

Publish internally or externally on methods that work

₹22–35 LPA

Your first week ML task — what it really looks like: Your lead sends you a Slack message: "We're seeing high return rates on electronics. Can you build something that flags orders likely to be returned before we ship them?" You now know what this means: Supervised Learning classification problem. Features: product category, order value, customer history, payment method. Label: returned / not returned. Workflow: collect historical orders with return outcomes → engineer features → train a classifier → evaluate precision and recall → deploy if it beats baseline. That's the job.

What comes next

You're ready for the first algorithm

You now have the foundation. You know what Machine Learning is, how it differs from traditional programming, what the three types are, what the seven-step workflow looks like, and what the key vocabulary means.

The next page introduces the simplest possible supervised learning algorithm — Linear Regression — and uses it to build an actual delivery time predictor for DoorDash. You will see every concept from this page in working code.

Next up in Classical ML

Linear Regression — predicting DoorDash delivery time

coming soon

🎯 Key Takeaways

✓ML = examples in, rules out. You provide labelled data; the algorithm finds the patterns and encodes them as a model.
✓Training = predict → measure error → adjust → repeat. The model iterates over the training data, nudging its parameters toward lower loss on each pass.
✓Three types: Supervised (labelled data, most common), Unsupervised (no labels, find structure), Reinforcement (learn from rewards). This section covers Supervised.
✓Every ML project follows the same 7-step workflow: define problem → collect data → prepare data → train → evaluate → improve → deploy.
✓Overfitting means memorising training data (good train score, bad test score). Underfitting means too simple (bad both). Both are diagnosable and fixable.
✓12 key vocabulary terms — feature, label, model, training/test data, parameters, loss, overfitting, underfitting, hyperparameter, inference, baseline — are defined and will not be re-explained.

Discussion

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub