Models

Models#

This page discusses details related to the MLFlow model component, which is responsible for model versioning and deployment.

Check more in:

MLFlow pyfunc page of the official documentation.

import mlflow
import mlflow.pyfunc.utils

!rm -rf /tmp/models_set_up
mlflow.set_tracking_uri("file:///tmp/models_set_up")

PyFunc#

The mlflow.pyfunc module implements the mlflow features that allow you to build flawour-agnostic models. It is ideal for highly specific approaches that don’t fit within the boundaries of any popular framework.

There are two options to create model:

The funciton was wrapped in the mlflow.pyfunc.utils.pyfunc decorator.
The class is an ancestor of the mlflow.pyfunc.PythonModel.

Check more in mlflow.pyfunc API reference.

Function#

The following cell defines the example_model function, which performs a computation intended to imitate a imtate model. Log it and register it in the mlflow registry.

@mlflow.pyfunc.utils.pyfunc
def example_model(model_input: list[float]) -> list[float]:
    return list(map(lambda x: x**2, model_input))

with mlflow.start_run():
    mlflow.pyfunc.log_model(
        name="model",
        python_model=example_model,
        registered_model_name="pyfunc_model",
        pip_requirements=[]
    )

Successfully registered model 'pyfunc_model'.
Created version '1' of model 'pyfunc_model'.

The following cell shows the code that loads model, and uses the model.

model = mlflow.pyfunc.load_model("models:/pyfunc_model/1")
model.predict([1., 2., 3.])

[1.0, 4.0, 9.0]

The results correspond to the calculation specified in the example_model function.

Class#

In class have to be define the predict method. Tha is implements the model’s computation.

The following cell shows the class definition of the class and it’s logging to the mlflow.

from mlflow.pyfunc.model import PythonModel

class MyModel(PythonModel):
    test = "I'm field you want to acess!!!"
    def predict(self, model_input: list[float]):
        return [x * 2 for x in model_input]

with mlflow.start_run():
    mlflow.pyfunc.log_model(
        name="model",
        python_model=MyModel(),
        pip_requirements=["pandas"],
        registered_model_name="pyclass_model"
    )

Successfully registered model 'pyclass_model'.
Created version '1' of model 'pyclass_model'.

The code for loading the model from the registry is the same as it was for the function approach.

model = mlflow.pyfunc.load_model("models:/pyclass_model/1")
model.predict([2., 4.])

[4.0, 8.0]

Versioning#

When you register a model realisation under an existing registered model, MLFlow automatically saves it as the new version of this model assignes it a corresponding version number. To manage the model versions you can attach to a particular model some alias, the name that refers to a specific model version, so you can esily access the model by alias. You can acces the specific version of the model by using uri formatted like models:/{registered model name}/{name of the version}.

The following cell creates registers two versions of the "versioning_example" model.

@mlflow.pyfunc.utils.pyfunc
def model1(model_input: list[int]) -> str:
    return "first model" 

@mlflow.pyfunc.utils.pyfunc
def model2(model_input: list[int]) -> str:
    return "second model"

registered_model_name = "versioning_example"

mlflow.pyfunc.log_model(
    name="model",
    python_model=model1,
    registered_model_name=registered_model_name,
    pip_requirements=[]
)
mlflow.pyfunc.log_model(
    name="model",
    python_model=model2,
    registered_model_name=registered_model_name,
    pip_requirements=[]
)

Successfully registered model 'versioning_example'.
Created version '1' of model 'versioning_example'.
Registered model 'versioning_example' already exists. Creating a new version of this model...
Created version '2' of model 'versioning_example'.

The log contains messages indicating that two versions of the model are stored.

The next code invokes different versions of the model. There are versions 1 and 2 at the last section of the path.

mlflow.pyfunc.load_model(f"models:/{registered_model_name}/1").predict([10, 20])

'first model'

mlflow.pyfunc.load_model(f"models:/{registered_model_name}/2").predict([40, 20])

'second model'

The models have the output from the predict method as defined.

Alias#

Aliases are a method of querying the correct model. You can attach an alias to a specific version and reset the version the alias is attached anytime. Therefore, instead of hardcoding the version of the model in the production environment and changing it every time another version of the model has to be used for a particular deployment, you can simply attach an alias to the relevant version.

Use mlflow.MlflowClient().set_registeted_model_alias to attach an alias to a specified version of the model.
You can access the version with a specific alias by using the following uri models:/{registed model name}@{alias name}}].

The following cell is attaches MyAlias to the fist version of the model.

from mlflow import MlflowClient
client = MlflowClient()

client.set_registered_model_alias(
    registered_model_name, "MyAlias", "1"
)

The first registered model is loaded from the URI ending with @MyAlias.

mlflow.pyfunc.load_model(
    f"models:/{registered_model_name}@MyAlias"
).predict([10])

'first model'

Latest#

Invoking the URL with models:/{registered model name}/latest will return the latest registered version of the model.

The following cell loads the model with latest URI. It returns the model whose outputs correspond to the second registered model.

mlflow.pyfunc.load_model(
    f"models:/{registered_model_name}/latest"
).predict([10])

'second model'

Signature#

An MLFlow model signature is declares of the model inputs, outputs, and parameters of a model. It documents the model and provides potential consumers with information about it, particullary built in serving point uses it to validate the model’s inputs.

The MLFlow schema is optional, and cosists of followin optional in particular elelemts:

Input schema: description of the single element that can be processed by the model.
Output schema: description of the output that corresponds to the single element of the model.
Parameters schema: schema of the parameteres of the model (for example “temperature” of the LLM)

The main elements of the MLFlow API associated with signatures are as following:

Element	Type	Description
`ModelSignature`	Class	Represents the full input/output schema of a model.
`mlflow.types.Schema`	Class	Collection of column specs describing input/output structure.
`mlflow.types.ColSpec`	Class	Describes a single column (type, name, optional).
`TensorSpec`	Class	Describes a tensor (dtype, shape, name).
`ParamSchema`	Class	Schema for parameter-based inputs (non-data, e.g. hyperparams).
`DataType`	Enum	Supported MLflow types: `integer`, `long`, `float`, `double`, `boolean`, `string`, `binary`.
`infer_signature`	Function	Infers a `ModelSignature` from sample input/output.
`to_dict` / `from_dict`	Method	Serialize/deserialize `ModelSignature` objects.
`__eq__`	Method	Compare two signatures for equality (useful in tests).

There are two types of the signatures:

Column-based: allocates to name of the feature expected datatype.
Tensor-based: assumes an array of elements as input, with each value comes in the correct position.

Check the informations in the Model Signatures and Input Examples models.

The following cell logs the model with input_example, which will be used to generate the model’s signature.

import pandas as pd
import mlflow
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=100, n_features=4, noise=0.1)
X = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(X.shape[1])])
X["feature_1"] = X["feature_1"].astype(int)

with mlflow.start_run():
    rf = RandomForestRegressor().fit(X, y)
    mlflow.sklearn.log_model(
        rf,
        name="model",
        registered_model_name="rf_model",
        input_example=X.iloc[:5]
    )

/home/user/.virtualenvironments/python/lib/python3.13/site-packages/mlflow/types/utils.py:452: UserWarning: Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.
  warnings.warn(
Registered model 'rf_model' already exists. Creating a new version of this model...
Created version '2' of model 'rf_model'.

To obtain the signature of a registered model, retrieve the signature attribute from the ModelInfo object.

model = mlflow.models.get_model_info("models:/rf_model/1")
model.signature

inputs: 
  ['feature_0': double (required), 'feature_1': long (required), 'feature_2': double (required), 'feature_3': double (required)]
outputs: 
  [double (required)]
params: 
  None

The same information you can get from mlflow interface.

Manual defition#

To define schema manually you have to use mlflow.models.ModelSignature class that is initialised by the:

input: mlflow.types.Schema.
output: mlfow.types.Schema.
params: mlflow.types.ParamSchema.

The following cell defines the ModelSignature. Here:

input is defined using a column-based approach and expects an array of the elements with the corresponding column names.
output uses tensor-based approach where output is a set of 2 elements vectors.
params contains a single parameter that by default is defined with 0.7.

import numpy as np
from mlflow.types import Schema, ColSpec,ParamSchema, ParamSpec, TensorSpec
from mlflow.models import ModelSignature

input_schema = Schema([
    ColSpec("double", "value1"),
    ColSpec("integer", "value2"),
    ColSpec("string", "value3")
])
output_schema = Schema([TensorSpec(np.dtype(np.float32), (-1, 2))])
parameters_schema = ParamSchema([ParamSpec("temperature", "double", default=0.7)])

ModelSignature(inputs=input_schema, outputs=output_schema, params=parameters_schema)

inputs: 
  ['value1': double (required), 'value2': integer (required), 'value3': string (required)]
outputs: 
  [Tensor('float32', (-1, 2))]
params: 
  ['temperature': double (default: 0.7)]

MLflow file#

The MLflow file is the single source of truth about how the model should be loaded and used. It is a yaml-like file and the important fields are:

artifact_path: During the training job, the model is logged to this path.
flavor: The machine learning library with which the model was created.
model_uuid: The unique identifier of the registered model.
run_id: The unique identifier of job run during which the model was created.
signature: Specifies the schema of the model’s inputs and outputs.

The following cell creates the model and stores it in the registry.

@mlflow.pyfunc.utils.pyfunc
def example_model(model_input: list[float]) -> list[float]:
    return list(map(lambda x: x**2, model_input))

with mlflow.start_run():
    model = mlflow.pyfunc.log_model(
        name="model",
        python_model=example_model,
        registered_model_name="file_example",
        pip_requirements=[]
    )

Successfully registered model 'file_example'.
Created version '1' of model 'file_example'.

The following cell displays the MLmodel file for the model that has been created.

path = out.artifact_path.replace("file://", "")
!cat $path/MLmodel

artifact_path: file:///tmp/models_set_up/0/models/m-ae18ef8017a44d8986e3162c3e3b3be3/artifacts
flavors:
  python_function:
    cloudpickle_version: 3.1.1
    code: null
    env:
      conda: conda.yaml
      virtualenv: python_env.yaml
    loader_module: mlflow.pyfunc.model
    python_model: python_model.pkl
    python_version: 3.13.7
    streamable: false
is_signature_from_type_hint: true
mlflow_version: 3.4.0
model_id: m-ae18ef8017a44d8986e3162c3e3b3be3
model_size_bytes: 2118
model_uuid: m-ae18ef8017a44d8986e3162c3e3b3be3
prompts: null
run_id: 7f8d210085554901812095e77c091082
signature:
  inputs: '[{"type": "double", "required": true}]'
  outputs: '[{"type": "double", "required": true}]'
  params: null
type_hint_from_example: false
utc_time_created: '2025-10-27 06:35:16.437546'