ML Pipelines and Feature Stores
Feature pipelines, training pipelines, inference pipelines. Feast for feature stores. Airflow and Prefect for orchestration. How production ML actually runs.
A Jupyter notebook trains a model once. A pipeline trains it every day, on fresh data, reproducibly, without anyone running it manually. That is the difference between a prototype and a production ML system.
Every ML model at Swiggy, Razorpay, and Flipkart runs on a pipeline. The delivery time prediction model retrains every night on the day's orders. The fraud detection model retrains weekly as new fraud patterns emerge. The product recommendation model retrains daily as inventory changes. None of these happen by someone running a notebook — they are automated, scheduled, monitored, and alerting pipelines.
Three types of pipelines work together in every production ML system. The feature pipeline extracts raw events from databases and streams, transforms them into model-ready features, and writes them to a feature store. The training pipeline reads features from the store, trains the model, evaluates it, and registers it if it passes quality gates. The inference pipeline reads features for a specific prediction request, loads the registered model, and serves a prediction in milliseconds. These three must stay in sync — if the feature pipeline changes how it computes a feature, the training and inference pipelines must change together or the model silently degrades.
A restaurant kitchen has three pipelines: the supply pipeline (ingredients arrive, are prepped, and stored in the walk-in fridge), the recipe pipeline (chefs create and test new dishes using stored ingredients), and the serving pipeline (orders come in, ingredients are pulled from storage, dishes are prepared and served). The walk-in fridge is the feature store. If the supply pipeline changes how the vegetables are cut (feature engineering), the recipes (training) and serving (inference) must use the same cut — or the dish comes out wrong.
Training-serving skew — when features are computed differently at training time vs inference time — is the number one silent failure mode in production ML. Feature stores exist specifically to prevent it by computing features once and serving the same values to both pipelines.
Feature pipeline, training pipeline, inference pipeline — how they connect
Feature stores — one definition, consistent values, training and serving
A feature store solves the most common production ML problem: the feature computed in the training notebook is not the same feature computed in the serving API. The feature store is a central registry where features are defined once, computed once, and read by both the training pipeline and the inference service. Training and serving are always consistent.
Feature stores have two components. The offline store (typically S3, BigQuery, or Parquet files) holds the full historical feature values — used for training. It supports time-travel queries: give me the value of this feature for this entity as of a specific past timestamp. This is critical for preventing data leakage. The online store (typically Redis or DynamoDB) holds only the latest feature values — used for real-time inference. Writes to both are handled by the feature pipeline's materialisation job.
Prefect — define pipelines as Python, schedule and monitor from the UI
Orchestrators schedule pipeline runs, handle failures, retry failed steps, send alerts, and provide a dashboard of what ran, when, and what failed. Airflow is the industry standard but requires significant infrastructure. Prefect offers the same capabilities with far simpler setup — decorate Python functions with @task and @flow, run them locally or in the cloud, and get a full observability dashboard.
Airflow DAGs — the pattern used at Swiggy, Flipkart, and Razorpay
Apache Airflow is the most widely deployed ML orchestrator in India. Every major Indian tech company runs Airflow for data and ML pipelines. An Airflow DAG (Directed Acyclic Graph) defines the tasks and their dependencies as Python code. Airflow schedules DAG runs, retries failures, sends email alerts, and provides a rich UI showing every run's status.
Feast — define features once, serve consistently to training and inference
Every common ML pipeline mistake — explained and fixed
You can build ML pipelines. Next: track every experiment so you never lose a good model again.
Pipelines produce models automatically. But which of the 50 models trained over the past month is the best? What hyperparameters, what data version, what feature set produced it? Without experiment tracking you cannot answer these questions. Module 70 covers MLflow and Weights & Biases — log every run, compare experiments on a dashboard, version models, and register the best ones for deployment.
Log every run, compare experiments, version models, register artifacts. Never lose a good experiment again.
🎯 Key Takeaways
- ✓Three pipelines power every production ML system: the feature pipeline (raw events → features → feature store, runs hourly/daily), the training pipeline (feature store → model → model registry, runs daily/weekly), and the inference pipeline (request → feature store → model → prediction, runs in real time). All three must use the same feature definitions or training-serving skew silently degrades model quality.
- ✓A feature store has two layers: the offline store (full history in S3/BigQuery/Parquet, used for training with point-in-time queries) and the online store (latest values in Redis, used for inference at ~1ms latency). Materialisation jobs sync the offline store to the online store, typically daily via an Airflow DAG.
- ✓Point-in-time correct feature retrieval is mandatory for training. For each training event at time T, only use feature values computed from data with timestamp ≤ T. Fetching the latest features regardless of event time causes data leakage — the model appears excellent in evaluation but fails in production because future data is not available at inference time.
- ✓Prefect turns Python functions into pipeline tasks with @task and @flow decorators. Tasks get retries, caching, and logging automatically. Flows define the DAG structure. Run locally for development, deploy to Prefect Cloud or self-hosted server for production scheduling.
- ✓Airflow DAGs define ML pipelines as Python — tasks are PythonOperator/BranchPythonOperator/EmailOperator nodes connected by >> dependencies. BranchPythonOperator enables conditional logic (pass data validation → train, fail → alert). XCom passes data between tasks. schedule_interval sets the cron schedule. Used by Swiggy, Flipkart, Razorpay, and most Indian unicorns.
- ✓The most dangerous ML pipeline failure is silent: feature extraction succeeds but returns 0 or wrong rows, training proceeds on bad data, a degraded model is promoted. Always add explicit data quality gate tasks that check row counts, null rates, and value distributions before training. Make quality checks raise exceptions on failure — Airflow marks tasks as failed only on unhandled exceptions.
Discussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.