Models#
This page discusses details related to the MLFlow model component, which is responsible for model versioning and deployment.
Check more in:
MLFlow pyfunc page of the official documentation.
import mlflow
import mlflow.pyfunc.utils
!rm -rf /tmp/models_set_up
mlflow.set_tracking_uri("file:///tmp/models_set_up")
PyFunc#
The mlflow.pyfunc module implements the mlflow features that allow you to build flawour-agnostic models. It is ideal for highly specific approaches that don’t fit within the boundaries of any popular framework.
There are two options to create model:
The funciton was wrapped in the
mlflow.pyfunc.utils.pyfuncdecorator.The class is an ancestor of the
mlflow.pyfunc.PythonModel.
Function#
The following cell defines the example_model function, which performs a computation intended to imitate a imtate model. Log it and register it in the mlflow registry.
@mlflow.pyfunc.utils.pyfunc
def example_model(model_intput: list[float]) -> list[float]:
return list(map(lambda x: x**2, model_intput))
with mlflow.start_run():
mlflow.pyfunc.log_model(
name="model",
python_model=example_model,
registered_model_name="pyfunc_model",
pip_requirements=[]
)
/home/user/.virtualenvironments/python/lib/python3.13/site-packages/mlflow/pyfunc/utils/data_validation.py:155: FutureWarning: Model's `predict` method contains invalid parameters: {'model_intput'}. Only the following parameter names are allowed: context, model_input, and params. Note that invalid parameters will no longer be permitted in future versions.
param_names = _check_func_signature(func, "predict")
Successfully registered model 'pyfunc_model'.
Created version '1' of model 'pyfunc_model'.
The following cell shows the code that loads model, and uses the model.
model = mlflow.pyfunc.load_model("models:/pyfunc_model/1")
model.predict([1., 2., 3.])
[1.0, 4.0, 9.0]
The results correspond to the calculation specified in the example_model function.
Class#
In class have to be define the predict method. Tha is implements the model’s computation.
The following cell shows the class definition of the class and it’s logging to the mlflow.
class MyModel(mlflow.pyfunc.PythonModel):
test = "I'm field you want to acess!!!"
def predict(self, context, model_input: list[float], params=None):
return [x * 2 for x in model_input]
with mlflow.start_run():
mlflow.pyfunc.log_model(
name="model",
python_model=MyModel(),
pip_requirements=["pandas"],
registered_model_name="pyclass_model"
)
Successfully registered model 'pyclass_model'.
Created version '1' of model 'pyclass_model'.
The code for loading the model from the registry is the same as it was for the function approach.
model = mlflow.pyfunc.load_model("models:/pyclass_model/1")
model.predict([2., 4.])
[4.0, 8.0]
Signature#
An MLFlow model signature is declares of the model inputs, outputs, and parameters of a model. It documents the model and provides potential consumers with information about it, particullary built in serving point uses it to validate the model’s inputs.
The MLFlow schema is optional, and cosists of followin optional in particular elelemts:
Input schema: description of the single element that can be processed by the model.
Output schema: description of the output that corresponds to the single element of the model.
Parameters schema: schema of the parameteres of the model (for example “temperature” of the LLM)
The main elements of the MLFlow API associated with signatures are as following:
Element |
Type |
Description |
|---|---|---|
|
Class |
Represents the full input/output schema of a model. |
|
Class |
Collection of column specs describing input/output structure. |
|
Class |
Describes a single column (type, name, optional). |
|
Class |
Describes a tensor (dtype, shape, name). |
|
Class |
Schema for parameter-based inputs (non-data, e.g. hyperparams). |
|
Enum |
Supported MLflow types: |
|
Function |
Infers a |
|
Method |
Serialize/deserialize |
|
Method |
Compare two signatures for equality (useful in tests). |
There are two types of the signatures:
Column-based: allocates to name of the feature expected datatype.
Tensor-based: assumes an array of elements as input, with each value comes in the correct position.
Check the informations in the Model Signatures and Input Examples models.
The following cell logs the model with input_example, which will be used to generate the model’s signature.
import pandas as pd
import mlflow
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=4, noise=0.1)
X = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(X.shape[1])])
X["feature_1"] = X["feature_1"].astype(int)
with mlflow.start_run():
rf = RandomForestRegressor().fit(X, y)
mlflow.sklearn.log_model(
rf,
name="model",
registered_model_name="rf_model",
input_example=X.iloc[:5]
)
/home/user/.virtualenvironments/python/lib/python3.13/site-packages/mlflow/types/utils.py:452: UserWarning: Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.
warnings.warn(
Registered model 'rf_model' already exists. Creating a new version of this model...
Created version '2' of model 'rf_model'.
To obtain the signature of a registered model, retrieve the signature attribute from the ModelInfo object.
model = mlflow.models.get_model_info("models:/rf_model/1")
model.signature
inputs:
['feature_0': double (required), 'feature_1': long (required), 'feature_2': double (required), 'feature_3': double (required)]
outputs:
[double (required)]
params:
None
The same information you can get from mlflow interface.
Manual defition#
To define schema manually you have to use mlflow.models.ModelSignature class that is initialised by the:
input: mlflow.types.Schema.output: mlfow.types.Schema.params: mlflow.types.ParamSchema.
The following cell defines the ModelSignature. Here:
inputis defined using a column-based approach and expects an array of the elements with the corresponding column names.outputuses tensor-based approach where output is a set of 2 elements vectors.paramscontains a single parameter that by default is defined with 0.7.
import numpy as np
from mlflow.types import Schema, ColSpec,ParamSchema, ParamSpec, TensorSpec
from mlflow.models import ModelSignature
input_schema = Schema([
ColSpec("double", "value1"),
ColSpec("integer", "value2"),
ColSpec("string", "value3")
])
output_schema = Schema([TensorSpec(np.dtype(np.float32), (-1, 2))])
parameters_schema = ParamSchema([ParamSpec("temperature", "double", default=0.7)])
ModelSignature(inputs=input_schema, outputs=output_schema, params=parameters_schema)
inputs:
['value1': double (required), 'value2': integer (required), 'value3': string (required)]
outputs:
[Tensor('float32', (-1, 2))]
params:
['temperature': double (default: 0.7)]