MLflow

MLflow#

Is a popular pool for tracking machine learning models. These pages focus on aspects of using mlflow.

Learm more:

Tracking server overview on official mlflow site;
MLflow tracking quickstart on official mlflow site;
Official MLflow docker image.

Configuration#

There are a few aspects of the MLFlow configuration.

The client-side configuration is mainly involves setting the tracking URI and, if necessary, the credentials to access it. The MLFlow client can work without the server being launched; it simply will stores information in a special format in a local folder.

The server can easily can be set up just by using the special mlflow server command from the python environmet where mlflow is installed.

For more details check the Configuration page.

Tracking#

Tracking is an MLFlow component that stores, visualizes, and organizes the model fitting process. It tracks what happens during model training.

More specifically MLFlow tracks:

Record experiment parameters such as hyperparameters and model configuration.
Log metrics and compare them for performance measures.
Store and manage output artifacts such as trained models and plots.
Store model’s source code from the run.

The model fitting process generally involves the attempt to make similar computations but with slight changes in the approach, implementation, or hyperparameters. To systematize the set of entities that are usually generated during different attempts mlflow introduces a few terms that are described in the following table.

Term	Description
tag	Key-value pair that somehow describes run.
experiment	Series of runs that represents attempts to train a specific model.
run	Separate attempt to train the model with given set of hyperparameters.

Find more in:

MLFlow Tracking Quickstart official tutorial.
Tracking page.
MLFlow Tracking APIs where different functions to interact with the Tracking API are referenced.

Flavours#

There are a tone of ML frameworks today, and mlflow is supposed to support them. For this purpose it has the flavour concept - it’s a wrapper that works with a particular framework. If model doesn’t correspond to the given flavours then it is supposed to use pyfunc.

Typically flavours are submodules of mlflow that you’re supposed to use in your pipelines, the following table lists flavours awailable in mlflow.

Flavor Name	Description
`python_function`	The universal Python-oriented flavor (aka `pyfunc`). Enables model loading with `mlflow.pyfunc.load_model()`; great for generic deployment and inference workflows.
`mlflow.sklearn`	Flavor for scikit-learn models. Ensures easy logging and reproducibility of scikit-learn artifacts.
`mlflow.tensorflow`	For TensorFlow (TF1.x & TF2.x) models. Manages saving and loading in TF’s native format.
`mlflow.keras`	Specifically for Keras models. Built on top of the TensorFlow backend, but tailored for Keras workflows.
`mlflow.pytorch`	Manages PyTorch models using TorchScript or native serialization.
`mlflow.spark`	For Spark MLlib models. Handles persistence via Spark’s `PipelineModel` or `Estimator`.
`mlflow.xgboost`	Supports XGBoost models. Saves and loads model artifacts via XGBoost’s binary format.
`mlflow.lightgbm`	For LightGBM models—handles Booser-style models efficiently.
`mlflow.catboost`	For CatBoost models. Ensures correct serialization and logging of CatBoost estimators.
`mlflow.pyfunc`	Essentially the same as `python_function`, serving as a catch-all flavor that wraps custom Python inference logic.

Data#

MLFlow contains a special mlflow.data module that makes it easy to save and load datasets used for model training and evaluation.

It have two interfaces:

Dataset: Represents a dataset.
DatasetSource: Represents a source of the data.

You can create a Dataset from many Python data frameworks. The following cell shows the classes that represent data objects from different frameworks, and the methods used to create them:

Framework	Object	Create method
`pandas`	`mlflow.data.pandas_dataset.PandasDataset`	`mlflow.data.from_pandas`
`Numpy`	`mlflow.data.numpy_dataset.NumpyDataset`	`mlflow.data.from_numpy`
`Spark`	`mlflow.data.spark_dataset.SparkDataset`	`mlflow.data.from_spark`
		`mlflow.data.load_delta`
`Hugging Face`	`mlflow.data.huggingface_dataset.HuggingFaceDataset`	`mlflow.data.huggingface_dataset.from_huggingface`
`TensorFlow`	`mlflow.data.tensorflow_dataset.TensorFlowDataset`	`mlflow.data.tensorflow_dataset.from_tensorflow`
	`mlflow.data.tensorflow_dataset.EvaluationDataset`
`polars`	`mlflow.data.polars_dataset.PolarsDataset`	`mlflow.data.polars_dataset.from_polars`

For more details check:

The MLFlow Dataset Tracking.
The mlflow.data API reference.

Consider the process of creating an MLFlow dataset from a csv file that is published on the internet.

import os
import mlflow
import mlflow.data
import pandas as pd

mlflow.set_tracking_uri("file:///tmp/mlflow_temp")

data_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"

data = pd.read_csv(data_url, sep=";")
data.head()

	fixed acidity	volatile acidity	citric acid	residual sugar	chlorides	free sulfur dioxide	total sulfur dioxide	density	pH	sulphates	alcohol	quality
0	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5
1	7.8	0.88	0.00	2.6	0.098	25.0	67.0	0.9968	3.20	0.68	9.8	5
2	7.8	0.76	0.04	2.3	0.092	15.0	54.0	0.9970	3.26	0.65	9.8	5
3	11.2	0.28	0.56	1.9	0.075	17.0	60.0	0.9980	3.16	0.58	9.8	6
4	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5

The following cell starts the run and logs this dataset under the run.

dataset = mlflow.data.from_pandas(data, source=data_url)

experiment = mlflow.set_experiment("test_datasets")
with mlflow.start_run():
    mlflow.log_input(dataset=dataset, context="training")

/home/user/.virtualenvironments/python/lib/python3.13/site-packages/mlflow/data/dataset_source_registry.py:148: UserWarning: Failed to determine whether UCVolumeDatasetSource can resolve source information for 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'. Exception: 
  return _dataset_source_registry.resolve(
/home/user/.virtualenvironments/python/lib/python3.13/site-packages/mlflow/types/utils.py:452: UserWarning: Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.
  warnings.warn(

Models#

The Model component of mlflow is responsible for registering and versioning the models. The summary of the usefull function is represented in the following table:

Feature	Description
Model Packaging	Packages models with their environment (conda, pip, or Docker) for reproducible deployment.
Multiple Flavors	Supports multiple “flavors” (e.g., `python_function`, `sklearn`, `pytorch`, `onnx`) to enable deployment across tools.
Model Registry Integration	Integrates with MLflow Model Registry to version, approve, and stage models for production.
Deployment Options	Deploys to local REST server, cloud services (AWS SageMaker, Azure ML), or custom targets.
Model Signature	Captures input and output schema (columns, types, shapes) to enforce interface consistency.
Input Example	Stores example input data for testing and documentation.
Custom Inference Logic	Allows defining custom `predict` logic using `python_function` flavor for flexible inference.
Environment Reproducibility	Captures dependencies using `conda.yaml`, `requirements.txt`, or `MLmodel` file for exact environment recreation.
CLI & API Support	Provides both CLI (`mlflow models serve`, `mlflow models predict`) and Python API for easy model serving and testing.
Cross-Framework Support	Works with Scikit-learn, TensorFlow, PyTorch, XGBoost, LightGBM, and many more ML frameworks.

For more details check:

MLflow Models page of the official documentation.
Models page.

The following cell sets up the mlflow and generates an example dataset.

import mlflow
import mlflow.models
from sklearn.datasets import make_classification
from sklearn.ensemble import GradientBoostingClassifier

mlflow.set_tracking_uri("file:///tmp/model_registry_example")
client = mlflow.MlflowClient()

X, y = make_classification(
    n_samples=500,
    n_features=5,
    random_state=1
)

The following cell creates a really simple mlflow run, that is not registred by default.

with mlflow.start_run():
    gbc = GradientBoostingClassifier().fit(X, y)
    signature = mlflow.models.infer_signature(X, y)
    mlflow.sklearn.log_model(
        gbc,
        signature=signature
    )

With MlflowClient.search_registred_model you can get a list of registered models.

for m in client.search_registered_models():
    print(m.name)

As expected it empty.

You can declare that a model has to be added to the model registry by specifying the registred_model_name argument for the log_model. The following cell creates the same run as the previous one, but with the registred_model_name is specified.

with mlflow.start_run():
    gbc = GradientBoostingClassifier().fit(X, y)
    signature = mlflow.models.infer_signature(X, y)
    mlflow.sklearn.log_model(
        gbc,
        signature=signature,
        registered_model_name="HEY! this is model"
    )

Successfully registered model 'HEY! this is model'.
Created version '1' of model 'HEY! this is model'.

Now, there is a model with a corresponding name in the outputs of the search_registered_models.

for m in client.search_registered_models():
    print(m.name)

HEY! this is model

Evaluation#

MLFlow provides tools for classical models/LLMs/datasets evaluation.

For more details check:

Model Evaluation page for more.
mlflow.model.evaluate funciton that performs evaluation.

The following cell generates the data frame and applies the evaluation to it. Column “0” is specified as the target and column “1” is specified as the predictions.

import numpy as np
import mlflow
import mlflow.models
import pandas as pd

mlflow.set_tracking_uri("file:///tmp/evaluation_example")


evaluation = mlflow.models.evaluate(
    data=pd.DataFrame(np.random.normal(0, 5, (200, 3)), columns=["0", "1", "2"]),
    targets="0",
    predictions="1",
    model_type="regressor"
)

2025/10/18 12:05:16 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...
2025/10/18 12:05:17 INFO mlflow.models.evaluation.evaluators.shap: Shap explainer ExactExplainer is used.
2025/10/18 12:05:22 WARNING mlflow.models.evaluation.evaluators.shap: Shap evaluation failed. Reason: TypeError("'NoneType' object is not callable"). Set logging level to DEBUG to see the full traceback.

MLFlow creates a run for evaluation or uses one in which context the evaluation was called. The following cell shows the id of the run used as an example.

evaluation.run_id

'0327f442c72545d9b7ca32ae69cb4040'

Typical regression metrics are logged in MLFLow. The following cell displays these metrics.

mlflow.get_run(evaluation.run_id).data.metrics

{'mean_on_target': -0.4862966540870622,
 'mean_absolute_error': 5.868862912486093,
 'example_count': 200.0,
 'mean_absolute_percentage_error': 2.4371270665250524,
 'sum_on_target': -97.25933081741243,
 'mean_squared_error': 53.270366873514504,
 'max_error': 20.886477207431845,
 'root_mean_squared_error': 7.298655141429447,
 'r2_score': -0.902927882934129}

Serving#

MLFlow provides tools that speed up the deployment of ml models:

There are few options to serve mlflow model:

Just run the model from the command line using mlflow models serve API.
Package the Docker container with the model with the mlfow models build-docker command.
All the previously mentioned tools are implemented in Python, so you can invoke directly the Python tools.

Check the Serving page.

Consider serving endpoint process. The following cell creates a mlflow model that we’ll use as example:

import mlflow

mlflow.set_tracking_uri("file:///tmp/mlflow_serving")
mlflow.set_registry_uri("file:///tmp/mlflow_serving")

@mlflow.pyfunc.utils.pyfunc
def model(model_input: list[float]) -> list[float]:
    return [x * 2 for x in model_input]

with mlflow.start_run() as run:
    mlflow.pyfunc.log_model(
        name="my_name",
        python_model=model,
        registered_model_name="my_registered_model",
        pip_requirements=[]
    )

Successfully registered model 'my_registered_model'.
Created version '1' of model 'my_registered_model'.

The command to run the server we’re using is:

MLFLOW_TRACKING_URI=file:///tmp/mlflow_serving mlflow models serve -m "models:/my_registered_model/1" -p 1234 --no-conda

To prevent locking of the input interface, the command is run in a separate process created from python.

command = 'MLFLOW_TRACKING_URI=file:///tmp/mlflow_serving mlflow models serve -m "models:/my_registered_model/1" -p 1234 --no-conda'

import subprocess
ans = subprocess.run(['gnome-terminal', '--', "bash", '-c', command])

The following cell sends the request to the server.

import requests

url = "http://127.0.0.1:1234/invocations"
headers = {'Content-Type': 'application/json'}
data = {"inputs": [1.0, 2.0, 3.0]}

response = requests.post(url, headers=headers, json=data)

print(response.text)

{"predictions": [2.0, 4.0, 6.0]}

MLflow

Contents

MLflow#

Configuration#

Tracking#

Flavours#

Data#

Models#

Evaluation#

Serving#