Skip to content

Development version

This is the latest (dev) documentation. It may contain unreleased features or breaking changes. For the stable release, use stable.

Pipeline Regression Tuning

This example tunes a scikit-learn Pipeline with GASearchCV. Pipeline parameters use the standard sklearn double-underscore syntax — regressor__max_depth reaches the max_depth of the step named regressor. We tune a StandardScaler + GradientBoostingRegressor pipeline on the diabetes dataset and measure the result the honest way: against a default, untuned model on an untouched test set.

Setup

We load the diabetes regression dataset (442 patients, 10 features) and hold out 30% for a final, untouched evaluation. Everything downstream is seeded for reproducibility.

python
import warnings
from pprint import pprint

import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import KFold, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklearn_genetic import (
    EvolutionConfig, GASearchCV, OptimizationConfig, PopulationConfig, RuntimeConfig,
)
from sklearn_genetic.callbacks import ConsecutiveStopping, DeltaThreshold, TimerStopping
from sklearn_genetic.schedules import ExponentialAdapter, InverseAdapter
from sklearn_genetic.space import Categorical, Continuous, Integer

warnings.filterwarnings("ignore")

RANDOM_STATE = 42

data = load_diabetes(as_frame=True)
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.30, random_state=RANDOM_STATE
)
cv = KFold(n_splits=4, shuffle=True, random_state=RANDOM_STATE)

print(f"Training shape: {X_train.shape}")
print(f"Test shape: {X_test.shape}")
text
Training shape: (309, 10)
Test shape: (133, 10)

Baseline Pipeline

Before tuning anything, fit a pipeline with default boosting settings. This is the number to beat — a model anyone gets for free with one line of code.

python
def make_pipeline(**regressor_kwargs):
    return Pipeline([
        ("scaler", StandardScaler()),
        ("regressor", GradientBoostingRegressor(random_state=RANDOM_STATE, **regressor_kwargs)),
    ])


def regression_metrics(estimator, X_eval, y_eval):
    predictions = estimator.predict(X_eval)
    rmse = mean_squared_error(y_eval, predictions) ** 0.5
    return {
        "r2": round(r2_score(y_eval, predictions), 4),
        "rmse": round(rmse, 2),
        "mae": round(mean_absolute_error(y_eval, predictions), 2),
    }


baseline = make_pipeline()
baseline.fit(X_train, y_train)
baseline_metrics = regression_metrics(baseline, X_test, y_test)
baseline_metrics
text
{'r2': 0.4303, 'rmse': 55.46, 'mae': 44.72}

Pipeline Search Space

Parameter names use the sklearn step__param convention, so every key below starts with regressor__ to target the boosting step inside the pipeline.

python
param_grid = {
    "regressor__n_estimators": Integer(40, 180),
    "regressor__learning_rate": Continuous(0.01, 0.20, distribution="log-uniform"),
    "regressor__max_depth": Integer(1, 4),
    "regressor__min_samples_leaf": Integer(1, 12),
    "regressor__subsample": Continuous(0.65, 1.0),
    "regressor__loss": Categorical(["squared_error", "absolute_error", "huber"]),
}
sorted(param_grid)
text
['regressor__learning_rate', 'regressor__loss', 'regressor__max_depth', 'regressor__min_samples_leaf', 'regressor__n_estimators', 'regressor__subsample']

Regression scorers

For metrics where smaller is better, use sklearn's negative scorer: "neg_root_mean_squared_error". The GA maximizes fitness, so negative RMSE increases (toward zero) exactly as RMSE decreases.

Configure GASearchCV

We keep the population and generations modest so the search finishes quickly, and warm-start it with a sensible hand-picked configuration so generation 0 already has a reasonable candidate to improve on.

python
search = GASearchCV(
    estimator=make_pipeline(),
    random_state=RANDOM_STATE,
    param_grid=param_grid,
    scoring="neg_root_mean_squared_error",
    criteria="max",
    cv=cv,
    evolution_config=EvolutionConfig(
        population_size=10,
        generations=8,
        crossover_probability=ExponentialAdapter(initial_value=0.8, end_value=0.4, adaptive_rate=0.15),
        mutation_probability=InverseAdapter(initial_value=0.25, end_value=0.08, adaptive_rate=0.25),
        tournament_size=3,
        elitism=True,
        keep_top_k=3,
    ),
    population_config=PopulationConfig(
        initializer="smart",
        warm_start_configs=[{
            "regressor__n_estimators": 100,
            "regressor__learning_rate": 0.05,
            "regressor__max_depth": 2,
            "regressor__min_samples_leaf": 4,
            "regressor__subsample": 0.85,
            "regressor__loss": "squared_error",
        }],
    ),
    runtime_config=RuntimeConfig(
        n_jobs=-1,
        parallel_backend="auto",
        use_cache=True,
        verbose=False,
        return_train_score=False,
    ),
    optimization_config=OptimizationConfig(
        local_search=True,
        local_search_top_k=2,
        local_search_steps=1,
        local_search_radius=0.20,
        diversity_control=True,
        diversity_threshold=0.30,
        diversity_stagnation_generations=3,
        diversity_mutation_boost=1.8,
        random_immigrants_fraction=0.10,
        fitness_sharing=True,
        sharing_radius=0.40,
    ),
)

callbacks = [
    DeltaThreshold(threshold=0.01, generations=5, metric="fitness_best"),
    ConsecutiveStopping(generations=7, metric="fitness_best"),
    TimerStopping(total_seconds=75),
]

search.fit(X_train, y_train, callbacks=callbacks)
print("fitted:", search.best_score_ is not None)
text
INFO: TimerStopping callback met its criteria
INFO: Stopping the algorithm
fitted: True

Evaluate Predictions

GASearchCV refits the best pipeline automatically, so you can call predict directly on the search object. The holdout comparison is the fastest sanity check: the GA should earn its extra search cost by improving the metrics you actually care about outside cross-validation.

python
print("Best CV negative RMSE:", round(search.best_score_, 4))
print("Best params:")
pprint(search.best_params_)

ga_metrics = regression_metrics(search, X_test, y_test)
comparison = pd.DataFrame(
    [baseline_metrics, ga_metrics],
    index=["default GBR", "GA-tuned pipeline"],
)
comparison
text
Best CV negative RMSE: -58.5933
Best params:
{'regressor__learning_rate': 0.05,
 'regressor__loss': 'squared_error',
 'regressor__max_depth': 1,
 'regressor__min_samples_leaf': 9,
 'regressor__n_estimators': 119,
 'regressor__subsample': 0.7807245051751381}
                       r2   rmse    mae
default GBR        0.4303  55.46  44.72
GA-tuned pipeline  0.4987  52.02  41.77
python
r2_gain = ga_metrics["r2"] - baseline_metrics["r2"]
rmse_drop = baseline_metrics["rmse"] - ga_metrics["rmse"]
mae_drop = baseline_metrics["mae"] - ga_metrics["mae"]
print(f"R2  : {baseline_metrics['r2']:.4f}  ->  {ga_metrics['r2']:.4f}   ({r2_gain:+.4f})")
print(f"RMSE: {baseline_metrics['rmse']:.2f}  ->  {ga_metrics['rmse']:.2f}   ({-rmse_drop:+.2f})")
print(f"MAE : {baseline_metrics['mae']:.2f}  ->  {ga_metrics['mae']:.2f}   ({-mae_drop:+.2f})")
text
R2  : 0.4303  ->  0.4987   (+0.0684)
RMSE: 55.46  ->  52.02   (-3.44)
MAE : 44.72  ->  41.77   (-2.95)

Plotting the same numbers makes the improvement on every metric obvious at a glance.

python
import matplotlib.pyplot as plt

labels = ["R2", "RMSE", "MAE"]
base_vals = [baseline_metrics["r2"], baseline_metrics["rmse"], baseline_metrics["mae"]]
ga_vals = [ga_metrics["r2"], ga_metrics["rmse"], ga_metrics["mae"]]

fig, axes = plt.subplots(1, 3, figsize=(11, 3.6))
colors = ["#95a5a6", "#16a085"]
for ax, label, b, g in zip(axes, labels, base_vals, ga_vals):
    bars = ax.bar(["default", "GA-tuned"], [b, g], color=colors, width=0.6)
    ax.set_title(label)
    ax.bar_label(bars, fmt="%.3g", padding=3)
    ax.margins(y=0.18)
    ax.grid(axis="y", alpha=0.25)
axes[0].set_ylabel("score")
fig.suptitle("Default GradientBoostingRegressor vs GA-tuned pipeline (holdout)")
fig.tight_layout()

Bar chart comparing R2, RMSE and MAE for the default model versus the GA-tuned pipeline

Higher R2 and lower RMSE/MAE for the GA-tuned pipeline — tuning beats the defaults on every metric.

Search Cost and Telemetry

fit_stats_ reports the evaluation accounting (how many candidates were actually scored, cache hits, random immigrants, local refinements), and history carries the per-generation convergence and diversity signals.

python
print(search.fit_stats_)
text
{'evaluated_candidates': 72, 'unique_candidates': 72, 'cross_validate_calls': 72, 'cache_hits': 0, 'duplicate_candidates': 0, 'skipped_invalid_candidates': 0, 'population_parallel_batches': 5, 'population_serial_batches': 0, 'random_immigrants': 0, 'local_refinement_candidates': 2}
python
history = pd.DataFrame(search.history)
cols = ["gen", "fitness", "fitness_max", "unique_individual_ratio",
        "genotype_diversity", "stagnation_generations"]
history[[c for c in cols if c in history.columns]].tail()
text
   gen    fitness  fitness_max  unique_individual_ratio  genotype_diversity  stagnation_generations
0    0 -61.243523   -59.037963                      1.0            0.722222                       0
1    1 -59.857250   -59.037963                      0.8            0.407407                       1
2    2 -60.193052   -59.045795                      0.6            0.388889                       2
3    3 -59.580004   -58.664782                      0.7            0.351852                       0

A fitted search can answer more pointed questions than "what won?". This landscape colors each evaluated candidate by its CV score across a learning-rate / tree-depth slice, highlighting which region was consistently promising.

python
from sklearn_genetic.plots import plot_cv_scores, plot_score_landscape

plot_score_landscape(
    search,
    x="regressor__learning_rate",
    y="regressor__max_depth",
)
plt.tight_layout()

Score landscape over learning rate and max depth, colored by CV score

Each point is one evaluated candidate; brighter points scored better in cross-validation.

When several candidates have similar mean scores, inspect fold-level stability before trusting a tiny ranking difference.

python
plot_cv_scores(
    search,
    top_k=5,
    label_params=["regressor__learning_rate", "regressor__max_depth"],
)
plt.tight_layout()

Box plot of fold-level CV scores for the top five candidates

Per-fold spread for the strongest candidates — a narrow box means a stable choice.

Practical Notes

  • Use pipeline parameter names exactly as sklearn expects them (step__param).
  • For regression losses where smaller is better, use sklearn's negative scorers such as "neg_root_mean_squared_error"; the GA maximizes fitness.
  • Compare holdout metrics, not only CV fitness — the default model is the honest baseline to beat.
  • If the search revisits many candidates, inspect cache_hits in fit_stats_ and consider stronger diversity controls or a wider search space.

See Also

Released under the MIT License.