Run reports and experiment tracking
What a finished fit leaves behind: a versioned JSON report with the full
provenance of the run, and — only if you opt in — one record in an experiment
tracker. Every python block on this page is self-contained: copy any one
of them and it runs as-is, and every block is executed on each CI run, so the
examples cannot rot. The examples use the lightweight
models=("baseline", "linear") so they finish in seconds.
The run report
After fit, run_report_ is a single tracker-independent JSON document (plain
dicts, lists and scalars — json.dumps works on it directly), versioned by
run_manifest_version and evolving additively, so a consumer reads known keys
and ignores unknown ones. It records the selection outcome (winner,
leaderboard, band, significance, holdout_score, per-candidate
failed), the resolved inputs (config, task, metric, preset), every
opt-in stage (feature_selection, hpo, ensemble, budget, cache) and
the provenance trail (honestml_version, run_fingerprint, timings,
serving). Blocks for features you did not opt into are None or report
their off-state — never silently missing.
from sklearn.datasets import make_classification
from honestml import AutoML
X, y = make_classification(n_samples=150, n_features=6, n_informative=4, random_state=0)
model = AutoML(task="binary", models=("baseline", "linear"), random_state=0).fit(X, y)
report = model.run_report_
print(sorted(report))
print(report["winner"], report["metric"], report["honestml_version"])
The run fingerprint
run_fingerprint is the reproducibility contract: a hex SHA-256 over canonical
JSON of everything that can change a candidate's out-of-fold score — the
resolved config, the task and metric identity, a content digest of the data
(design matrix, target, row-aligned metadata, schema), the resolved estimator
set and the installed library versions. Same inputs give the same fingerprint
and therefore the same selection; the key is fail-closed, so any change to any
ingredient changes it. It is also the cache key for cache=/resume.
Post-selection observability — tracker, finalize, report rendering — is
deliberately outside the fingerprint, because it cannot change the model.
from sklearn.datasets import make_classification
from honestml import AutoML
X, y = make_classification(n_samples=150, n_features=6, n_informative=4, random_state=0)
a = AutoML(task="binary", models=("baseline", "linear"), random_state=0).fit(X, y)
b = AutoML(task="binary", models=("baseline", "linear"), random_state=0).fit(X, y)
print(a.run_report_["run_fingerprint"] == b.run_report_["run_fingerprint"]) # True
print(a.run_report_["run_fingerprint"][:16], "...")
Saving and rendering
save_run_report(report, path) writes the report as indented UTF-8 JSON and
returns the written path; when path is an existing directory the file is
path/run_report.json, and overwrite=False raises FileExistsError instead
of replacing an existing one. render_report(report, path, fmt="md") renders a
human-readable summary with the winner, the band, the leaderboard, per-stage
timings and the resolved config; report may be the run_report_ mapping or a
path to a saved run_report.json, so save-then-render round-trips. Markdown
rendering needs nothing beyond the stdlib.
import tempfile
from pathlib import Path
from sklearn.datasets import make_classification
from honestml import AutoML, render_report, save_run_report
X, y = make_classification(n_samples=150, n_features=6, n_informative=4, random_state=0)
model = AutoML(task="binary", models=("baseline", "linear"), random_state=0).fit(X, y)
out = Path(tempfile.mkdtemp())
json_path = save_run_report(model.run_report_, out) # out/run_report.json
md_path = render_report(json_path, out, fmt="md") # out/run_report.md
print("\n".join(md_path.read_text(encoding="utf-8").splitlines()[:7]))
fmt="html" writes a single self-contained file; with the report extra
installed it embeds leaderboard and timing charts as inline PNG, and without it
degrades gracefully (chart-less HTML plus a WARNING — never an ImportError):
pip install "honestml[report]" # matplotlib, used only for the HTML charts
render_report(model.run_report_, out, fmt="html") # out/run_report.html
MLflow tracking
Tracking is opt-in and post-fit: pass tracker="mlflow", or a TrackerConfig
to set the experiment name, tracking URI, run name and tags. After a completed
fit, honestml logs exactly one MLflow run: the flattened resolved config as
params, the leaderboard scores, holdout score and stage timings as metrics,
provenance tags (honestml.version, honestml.fingerprint,
honestml.winner) and the full run_report.json as a run artifact. A missing
mlflow install fails fast before training starts, while a tracking failure
after the fit is downgraded to a WARNING — it can never destroy a finished
model. The adapter never mutates global mlflow state (no set_tracking_uri,
set_experiment or fluent start_run); everything goes through a client bound
to an explicit run id.
pip install "honestml[mlflow]"
from honestml import AutoML, TrackerConfig
model = AutoML(
task="binary",
models=("baseline", "linear"),
random_state=0,
tracker=TrackerConfig(
experiment="churn",
tracking_uri="http://mlflow:5000", # None defers to MLFLOW_TRACKING_URI -> file:./mlruns
run_name="weekly-refresh",
tags={"team": "risk"},
),
).fit(X, y)
# tracker="mlflow" is shorthand for TrackerConfig() — experiment "honestml", default URI
Custom tracking backends
tracker= also accepts any object implementing the ExperimentTracker port —
a single method log_run(report), called once per completed fit with a deep
copy of run_report_, so a mutating implementation cannot corrupt the facade's
own report. Implementations should ignore unknown keys, because the report
evolves additively. The same fail-soft rule applies: an exception raised by
your backend becomes a WARNING, not a failed fit.
from sklearn.datasets import make_classification
from honestml import AutoML
class ListTracker:
def __init__(self):
self.runs = []
def log_run(self, report):
self.runs.append(report)
X, y = make_classification(n_samples=150, n_features=6, n_informative=4, random_state=0)
tracker = ListTracker()
AutoML(task="binary", models=("baseline", "linear"), random_state=0, tracker=tracker).fit(X, y)
print(len(tracker.runs), tracker.runs[0]["winner"])
The report describes the run; the model itself ships separately — see artifacts, serving and ONNX export.