Hyperparameter Tuning with Optuna
Bayesian optimisation over GridSearch. Define a search space, let Optuna find the best hyperparameters with far fewer trials.
GridSearchCV evaluates 200 combinations blindly. Optuna evaluates 30 combinations intelligently — and usually finds a better answer.
Your gradient boosting model has 6 hyperparameters to tune: n_estimators, learning_rate, max_depth, subsample, colsample_bytree, reg_alpha. If you try 4 values per parameter with GridSearchCV, that is 4⁶ = 4,096 combinations. With 5-fold CV each combination trains 5 models — 20,480 model fits. At 30 seconds each, that is 7 days of compute.
RandomizedSearchCV cuts this to 100 random combinations — still blind, just fewer. It does not learn from previous trials. If learning_rate=0.01 consistently underperforms learning_rate=0.1, random search keeps wasting trials on learning_rate=0.01 anyway.
Optuna uses Bayesian optimisation. After each trial it builds a probabilistic model of which hyperparameter regions are likely to produce good scores. It uses this model to choose the next trial — focusing on promising regions and skipping areas already known to be bad. With 30–50 trials it typically matches or beats GridSearchCV on 200+ combinations.
Finding the best hyperparameters is like prospecting for gold in a mountain range. GridSearch digs at every point on a fixed grid — systematic but wasteful. Random search digs at random spots — faster but still uninformed. Optuna is a geologist who studies the rock formations after each dig. If gold appeared near a granite outcrop, they dig near other granite outcrops first. They learn from each result to make the next dig smarter.
Optuna's probabilistic model of the search space is called a surrogate model. The strategy for choosing the next trial from the surrogate is called an acquisition function. Together they make Optuna far more sample-efficient than any exhaustive or random search.
Grid vs Random vs Bayesian — what each one does and when it wins
Optuna in three steps — study, objective, optimize
Optuna has a simple API built around three concepts. A study is the optimisation session — it stores all trials and their results. An objective function is the function Optuna calls for each trial — it receives a trial object, samples hyperparameters from it, trains the model, and returns a score.optimize() runs the objective n_trials times, using previous results to guide each new trial.
Inside the objective function, you use the trial object to suggest hyperparameter values. Optuna chooses values based on its surrogate model — not randomly and not from a fixed grid.
Pruning, callbacks, and persistence — Optuna at scale
For expensive models, Optuna's pruning feature terminates unpromising trials early — after seeing partial results. If a trial looks bad after fold 2 of 5-fold CV, Optuna stops it and moves on. This can cut total compute by 30–50% with no loss in final quality.
Tuning XGBoost and LightGBM with Optuna — the full workflow
In production you will tune XGBoost or LightGBM far more often than sklearn's GradientBoostingClassifier. Both have native Optuna integration. The search spaces for these models are well-established and the code below gives you a production-ready starting template.
A systematic tuning workflow — what to tune first, how many trials
Tuning all hyperparameters simultaneously with a flat search space is inefficient. Some parameters matter far more than others. A systematic order dramatically reduces the trials needed.
Find the right learning rate range. Low lr needs many trees. High lr needs few. Fix the relationship before tuning anything else.
Control model complexity. Deeper trees = more capacity but more overfitting. min_child_samples prevents leaf overfitting.
Fine-tune generalisation. These parameters have diminishing impact — tune after structure is fixed.
Final polish. The search space is now small and well-targeted. Optuna finds the global optimum quickly.
Every common tuning mistake — explained and fixed
You can tune any model. Next: explain any prediction.
You have built, evaluated, calibrated, and tuned models. The final module in the Evaluation section answers the question stakeholders always ask after seeing the model performance: why did the model make this specific prediction? Module 39 covers SHAP and LIME — the two most widely used techniques for explaining individual predictions from any model. SHAP was introduced briefly in Module 30 for XGBoost. Module 39 covers it comprehensively across all model types including black-box models with no direct feature importance.
Explain any individual prediction. Global feature importance, local explanations, and how to present model decisions to regulators.
🎯 Key Takeaways
- ✓GridSearchCV evaluates every combination exhaustively — combinatorial explosion makes it unusable beyond 3 parameters. RandomizedSearchCV samples n_iter random combinations — better but still learns nothing between trials. Optuna uses Bayesian optimisation (TPE) to focus each new trial on promising regions based on all previous results.
- ✓The Optuna API has three pieces: create_study (the session), an objective function (trains and evaluates one hyperparameter combination, returns a score), and study.optimize (runs the objective n_trials times). Everything else — sampling, pruning, persistence — builds on this core.
- ✓Use trial.suggest_float with log=True for parameters that span orders of magnitude: learning_rate (0.001 to 0.3), reg_alpha (1e-8 to 10). Log-uniform sampling ensures equal exploration at each magnitude. Use trial.suggest_int for discrete parameters like n_estimators, max_depth, num_leaves.
- ✓Pruning stops unpromising trials early — report intermediate scores with trial.report() and check trial.should_prune() inside the CV loop. MedianPruner prunes any trial whose intermediate score falls below the median of completed trials at the same step. Saves 30–50% compute on expensive models.
- ✓Tune in phases for large search spaces: learning_rate + n_estimators first (biggest impact), then tree structure, then regularisation, then joint refinement in a narrow range around the best values. This finds the optimum with far fewer total trials than a flat all-parameters-at-once search.
- ✓Optuna studies are persistent — save to SQLite or PostgreSQL with the storage argument, set load_if_exists=True to resume. This lets you run 20 trials today, stop, and add 20 more tomorrow. The surrogate model continues improving from where it left off.
Discussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.