Skip to content

API reference

Everything below is importable from the top-level honestml package. Heavy training dependencies are imported lazily, so import honestml stays fast — loading an artifact for serving never executes the training stack.

Facade

After fit, the estimator exposes best_model_id_ (the honest winner), leaderboard_ (absolute OOF scores), fitted_ (the FittedModel serving handle for save_artifact) and run_report_ (the JSON-serializable run report).

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

Fit a small leaderboard for a tabular task and expose the winner.

fit(X, y, sample_weight=None, groups=None, time=None, label_time=None)

Fit the leaderboard and expose the winner.

groups (per-row group labels) enables group-aware CV with cv=CVConfig(scheme="group"): rows of the same group never span train and test. time declares the CV time axis for cv=CVConfig(scheme="timeseries") (purge/embargo, value-based order); label_time is the optional label-end-time t1 for full de Prado purge. All are row-aligned metadata like sample_weight — not features, not needed at predict time.

predict(X)

predict_proba(X)

score(X, y, sample_weight=None)

Metric score, sklearn convention (higher is better).

A lower-is-better metric (e.g. log_loss) is sign-flipped so grid-search and Pipeline maximize it; leaderboard_ carries the raw, unflipped value.

available_models(task=None) staticmethod

Discoverable models (built-in + plugins) and their capabilities.

Read-only and lazy: reads descriptors without materializing any adapter, so a boosting plugin is listed even when its extra is not installed.

Artifacts and serving

save_artifact(model.fitted_, path) writes the fitted handle that AutoML.fit exposes as the fitted_ attribute; load_artifact returns it back as a FittedModel — the lightweight serving handle.

Serialize model to a versioned artifact directory.

Writes the data files first, then a checksums block (sha256 of every file plus a digest of the manifest payload) so load_artifact can verify integrity before deserializing the model body. sign is an optional hook: it receives the manifest digest (hex) and returns a signature string written to signature for an authenticated verify= on load.

model_format picks the body serializer: "joblib" (the default) or "native" — a boosting body goes through the library's stable format (xgb ubj / cat cbm / lgbm text) instead of pickle; anything without a native format (sklearn models, a shipped ensemble) transparently stays joblib.

Load an artifact directory into a :class:FittedModel.

Order: read manifest -> version-gate -> verify integrity -> model_type dispatch + deserialize. require_integrity makes a missing checksums block an error (older artifacts warn by default); verify is an optional signature hook (signature, manifest_digest) -> bool.

SECURITY: a joblib body and calibrator.joblib are deserialized via joblib/pickle (a native boosting body is a structural file instead). The sha256 integrity check detects corruption and naive substitution, NOT authenticity — a malicious author can embed code with a matching digest; use verify (a signature) and load only from a trusted source. The version-gate is compatibility-only, not a trust check.

A fitted model with its preprocessing schema — the unified inference path.

classes is the global class order for classification and None for regression, so the inference path is kind-aware: multiclass proba is aligned to it and a regression model has no probabilities.

predict(X)

predict_proba(X)

score(X, y, sample_weight=None)

Export model to a standalone ONNX bundle in directory; returns the parity report.

sample (raw rows, anything the model can predict on) is REQUIRED: the model retains no training matrix, and without data the honesty gate cannot run — there is no silent skip. The gate compares the converted graph (float32, onnxruntime) against the native estimator's RAW output and raises :class:SchemaValidationError on a breach; a benign near-tie label flip (top-2 gap within the float32 noise band) is downgraded to a WARNING and recorded in onnx_manifest.json. Requires the onnx extra.

Run report

save_run_report writes the run_report_ mapping produced by AutoML.fit as JSON; render_report turns it into markdown or self-contained HTML.

Write report as indented UTF-8 JSON, returning the written file path.

If path is an existing directory, the report is written to path/run_report.json; otherwise path is the file itself. With overwrite=False an existing target raises :class:FileExistsError.

Render the run report as markdown or self-contained HTML.

report is the run_report_ mapping or a path to a saved run_report.json (round-trip with :func:save_run_report). fmt="md" needs nothing beyond the stdlib; fmt="html" embeds matplotlib charts as base64 PNG when the report extra is installed and degrades gracefully (WARNING, no charts) when it is not. If path is an existing directory the file is path/run_report.<fmt>.

Configuration

RunConfig is the resolved run configuration that AutoML.fit records in the run manifest (run_report_["config"]). You configure AutoML through its constructor arguments, which accept the section classes below directly: cv=CVConfig(...), budget=BudgetConfig(...), feature_engineering=FEConfig(...), feature_selection=FeatureSelectionConfig(...), hpo=HPOConfig(...), ensemble=EnsembleConfig(...). TrackerConfig stands apart: it configures the experiment tracker passed through the tracker argument of AutoML.

Bases: pydantic.main.BaseModel

Top-level run configuration; serializable basis of the run manifest.

model_config = {'extra': 'forbid', 'frozen': True} class-attribute

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

parse(data) classmethod

Validate untrusted input, raising :class:ConfigError on failure.

Bases: pydantic.main.BaseModel

Cross-validation scheme and its parameters.

scheme="auto" resolves to Task.default_cv_scheme at composition time; unimplemented schemes/params fail fast there, never silently.

model_config = {'extra': 'forbid', 'frozen': True} class-attribute

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

Bases: pydantic.main.BaseModel

Run budget: "none" (unbounded, default), wall-clock "time" or "trials".

model_config = {'extra': 'forbid', 'frozen': True} class-attribute

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

Bases: pydantic.main.BaseModel

Feature-engineering catalog toggles; all transformers default off.

A fixed, configurable catalog (not a plugin port). datetime deltas are a separate per-row axis driven by Task.report_date, NOT part of this config. Target-encoding is binary-classification-only; multiclass/regression gracefully skip it.

model_config = {'extra': 'forbid', 'frozen': True} class-attribute

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

Bases: pydantic.main.BaseModel

Feature-selection catalog; opt-in, default OFF via fs=None.

Ranker strategies importance/random_probe/null_importance/shap (lazy shap extra) plus the wrapper sequential (FeatureSubsetSelector port). compare runs several strategies and picks one subset-winner; compare=None is the single-strategy path. arbitration chooses the locus: "holdout" (a DEV-internal selection-holdout) or "nested" (K-fold on DEV; timeseries = expanding-window) with an honest significance winner. Anti-leakage OOF ranking/scoring lives in the application; the winning subset serializes into FeatureSchema. cutoff applies only to ranker strategies — sequential returns its own subset (seq_*). null_importance works on every scheme: i.i.d. schemes permute uniformly, timeseries/group permute the target WITHIN structure blocks of null_block_size rows / per group. Per-strategy randomness is isolated via a stable seed hash.

model_config = {'extra': 'forbid', 'frozen': True} class-attribute

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

Bases: pydantic.main.BaseModel

Hyperparameter-optimization catalog; opt-in, default OFF via hpo=None.

When set, composition tunes each tunable model type on an inner-CV of DEV (before the outer honest selection): the tuned factory replaces (or, with keep_baseline, augments) the baseline in the leaderboard. n_trials is the per-model search budget (distinct from BudgetConfig.n_trials, the run candidate-loop); inner_cv is the inner fold count of the tuning objective. timeout_s (per-model wall-clock cap) makes the search non-deterministic — surfaced in the run-report. models=None tunes every type with a non-empty search_space. The whole config is in the run-fingerprint (changed HPO -> new cache key).

model_config = {'extra': 'forbid', 'frozen': True} class-attribute

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

Bases: pydantic.main.BaseModel

Ensembling catalog; opt-in, default OFF via ensemble=None.

When set (and run_mode='full'), composition blends the leaderboard candidates after the honest selection and ships a :class:BlendedEstimator only if the blend is significantly better than the best single (the same SignificanceTest gate selection uses); otherwise the single winner is shipped. method is the weight search: "caruana" (default, greedy with replacement + seeded bagging) or "weighted" (SLSQP simplex). size caps Caruana steps / library; n_bags is the bagging count (1 = no bagging). metric=None blends on the run metric. The whole config is in the run-fingerprint (a changed ensemble config -> a new cache key).

model_config = {'extra': 'forbid', 'frozen': True} class-attribute

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

Bases: pydantic.main.BaseModel

Experiment-tracking opt-in; default OFF via tracker=None.

Post-selection observability: NOT part of :class:RunConfig / the run-fingerprint — tracking cannot change the model (like finalize). tracking_uri=None defers to the backend's own resolution (e.g. env MLFLOW_TRACKING_URI -> file:./mlruns); run_name=None lets the backend generate a neutral, data-independent name.

model_config = {'extra': 'forbid', 'frozen': True} class-attribute

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

Data and selection types

Task, FeatureSchema, ColumnRole and Dataset describe the input data; SelectionPolicy, Candidate and select_best implement final-model selection.

Bases: pydantic.main.BaseModel

Problem definition: kind + target metric name + split/typing policy.

default_cv_scheme property

Default cross-validation scheme when the user does not override it.

model_config = {'extra': 'forbid', 'frozen': True} class-attribute

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

target_metric property

The declared target metric name, or the default for this kind.

Bases: pydantic.main.BaseModel

Typed column contract: roles + schema-owned category tables + NaN policy.

Serializable, so the same schema (including fitted category tables and FE specs) is reused at inference. Built/validated by the Reader at the data boundary. The FE specs (datetime_spec/target_encoding/frequency_encoding/intersections) are additive and default None so an older artifact loads unchanged.

categorical property

CATEGORICAL features: original_categorical ⊕ intersections.

features property

Model-facing features in the pinned FE block order.

numeric ⊕ categorical where each block is itself FE-block-ordered, so this equals original_numeric ⊕ datetime ⊕ frequency ⊕ target_encoding ⊕ original_categorical ⊕ intersections. design_matrix materializes the numeric block then the categorical codes, so column j of the model input is exactly features[j]. Without FE this is the unchanged numeric + categorical.

model_config = {'extra': 'forbid'} class-attribute

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

numeric property

NUMERIC features: original_numeric ⊕ datetime ⊕ frequency ⊕ target_encoding.

Block order is derived from the FE specs, not the roles-dict insertion order, so it is deterministic and identical train==inference. Without FE this is the plain role view, unchanged.

time property

The TIME-role column (CV time axis), distinct from DATETIME features.

categorical_indices(cap=None)

Positions of natively-routed CATEGORICAL columns in the post-FS-projection design matrix.

Projects features to selected_features (in schema.features order, matching design_matrix) then takes the positions of the cardinality-gated categorical names (:func:native_routable); includes intersections (a__b) subject to the same gate and excludes the FE numeric outputs (_te/_freq/datetime). cap=None keeps every categorical (ungated opt-out, ADR-0092/0094). Empty when the (possibly projected/gated) set carries no native categoricals — a legitimate native no-op.

with_categories(tables)

Return a copy of the schema with the fitted category tables attached.

with_datetime_spec(spec)

Return a copy with the fitted datetime-delta spec attached.

with_frequency_encoding(spec)

Return a copy with the fitted frequency-encoding spec attached.

with_intersections(spec)

Return a copy with the intersection spec attached; pair tables go in categories.

with_selected_features(names)

Return a copy carrying the selected feature subset; design_matrix projects to it.

with_target_encoding(spec)

Return a copy with the fitted full-train target-encoding spec attached.

Bases: builtins.str, enum.Enum

Role a column plays. The core never hard-codes domain column names.

CATEGORICAL = <ColumnRole.CATEGORICAL: 'categorical'> class-attribute

Role a column plays. The core never hard-codes domain column names.

DATETIME = <ColumnRole.DATETIME: 'datetime'> class-attribute

Role a column plays. The core never hard-codes domain column names.

FOLD = <ColumnRole.FOLD: 'fold'> class-attribute

Role a column plays. The core never hard-codes domain column names.

GROUP = <ColumnRole.GROUP: 'group'> class-attribute

Role a column plays. The core never hard-codes domain column names.

IGNORE = <ColumnRole.IGNORE: 'ignore'> class-attribute

Role a column plays. The core never hard-codes domain column names.

NUMERIC = <ColumnRole.NUMERIC: 'numeric'> class-attribute

Role a column plays. The core never hard-codes domain column names.

TARGET = <ColumnRole.TARGET: 'target'> class-attribute

Role a column plays. The core never hard-codes domain column names.

TEXT = <ColumnRole.TEXT: 'text'> class-attribute

Role a column plays. The core never hard-codes domain column names.

TIME = <ColumnRole.TIME: 'time'> class-attribute

Role a column plays. The core never hard-codes domain column names.

Bases: typing.Protocol

Domain view over tabular data: numeric block, categorical codes, target.

categorical_codes()

Categorical feature codes as int64 with shape (n_rows, n_categorical).

Codes come from the schema-owned category table, so they are identical on train and inference.

groups()

Group-column values in row order, or None when there is no group role.

The single source of group labels for group-aware CV: the splitter and validate_fold both read this array, index-aligned with design_matrix, so the group/fold/feature ordering cannot drift.

label_time()

Optional per-row label-end-time t1 for full de Prado purge, or None.

Name-based secondary metadata (like sample_weight), present only when declared; used by the splitter to drop train rows whose label window overlaps the test interval.

sample_weight()

Per-row sample weights, or None.

select(columns)

Return a dataset restricted to columns (schema updated accordingly).

take(indices)

Return a dataset with only the given row indices (fold slicing).

target()

Target values, or None for an inference dataset.

time()

TIME-role column values in row order, or None.

The single, index-aligned source of the CV time axis for TimeSeriesSplitter and the value-based validate_fold (same contract as groups()), so the splitter never reads a reserved column name from the frame. Distinct from DATETIME features.

to_numpy()

Numeric feature block as float64 with shape (n_rows, n_numeric).

with_selected_features(names)

Return a dataset whose schema carries the feature-selection subset.

Same rows/frame; only schema.selected_features is set, so design_matrix projects the model input to names on refit and inference (train==inference by construction).

Bases: pydantic.main.BaseModel

Selection rule: absolute primary metric + inert lexicographic tie-break.

model_config = {'extra': 'forbid', 'frozen': True} class-attribute

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

A leaderboard entry: its absolute score plus secondary, OOF predictions.

oof_pred is the metric-ready out-of-fold vector the band aligns on: P(positive)/(n, K) proba for proba-metrics, else the predicted class/value. oof_mask marks which rows actually have an OOF prediction (holdout yields a partial OOF; degenerate folds are skipped), so validity is tracked by the mask, never np.isnan — which would crash on int/str class vectors.

n_features = 0 class-attribute

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.int(). For floating-point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by '+' or '-' and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal.

int('0b100', base=0) 4

stability = 0.0 class-attribute

Convert a string or number to a floating-point number, if possible.

train_time = 0.0 class-attribute

Convert a string or number to a floating-point number, if possible.

Return the winning candidate: absolute argmax, then equivalence tie-break.

Runtime utilities

honestml.__version__ — the installed package version.

Mutable state of a single run: timings, logger, config.

manifest()

Serializable run manifest (config + timings) — basis for replay.

record_stage_time(key, stage, elapsed)

Record a stage time loaded from cache (no timer).

timed_stage(key, stage)

Time a stage and record the elapsed seconds under timings[key][stage].

total_time(key)

Sum of all stage times recorded for key.

Return the library logger, or a child logger honestml.<name>.

Exceptions

All errors derive from honestml.AutoMLError:

Bases: builtins.Exception

Base class for every error raised by the library.

Bases: honestml.core.exceptions.AutoMLError

Invalid configuration (wraps validation failures at the boundary).

Bases: honestml.core.exceptions.AutoMLError

Input does not satisfy the FeatureSchema/Task contract.

Covers X/y length mismatch, unknown/missing columns, targets outside Task.kind, empty or all-NaN inputs, and dtype drift. Also the artifact/serialization format boundary: an unknown model_type/model_format and a non-exportable estimator are the same kind of contract violation, not a new exception type.

Bases: honestml.core.exceptions.AutoMLError

An optional extra is required but not installed.

Raised by adapters (not by core imports) so a missing boosting/tracking library surfaces as an actionable message instead of an ImportError deep in an import chain.

Bases: honestml.core.exceptions.AutoMLError

An artifact failed integrity verification before deserialization.

reason is one of missing_checksums (no checksums block under require_integrity), missing_file (a checksummed file is absent or its name escapes the artifact directory), digest_mismatch (a file's sha256 differs — corruption or naive tampering) or signature_mismatch (the optional signature hook rejected the artifact). Integrity detects corruption/naive substitution, NOT authenticity: a malicious author can embed code with a matching digest — use a signature (and load only from a trusted source) for that.

Bases: honestml.core.exceptions.AutoMLError

A fitted artifact was used before fit (e.g. predict on a fresh model).

Bases: honestml.core.exceptions.AutoMLError

The run budget was exhausted before any candidate completed.

Raised only when the budget skipped candidates and none finished — distinct from :class:FitFailedError (every candidate that started failed on its own). Carries the budget mode and the completed/skipped/failed counts for an actionable message.

Bases: honestml.core.exceptions.AutoMLError

A feature-selection strategy failed during compare (fail-fast).

Raised when any strategy in FeatureSelectionConfig.compare raises while selecting its subset: the offending strategy name is reported and the original error chained, instead of silently dropping a strategy from the comparison (no silent defaults).

fit may also raise more specific subclasses — notably FitFailedError (importable from honestml.core) when every candidate fails; catch AutoMLError to cover them all.