# Models

This page discusses details related to the MLFlow model component, which is responsible for model versioning and deployment.

Check more in:

- [MLFlow pyfunc](https://mlflow.org/docs/latest/api_reference/python_api/mlflow.pyfunc.html) page of the official documentation.

In [1]:
import mlflow
import mlflow.pyfunc.utils

!rm -rf /tmp/models_set_up
mlflow.set_tracking_uri("file:///tmp/models_set_up")

## PyFunc

The `mlflow.pyfunc` module implements the mlflow features that allow you to build flawour-agnostic models. It is ideal for highly specific approaches that don't fit within the boundaries of any popular framework.

There are two options to create model:

- The funciton was wrapped in the `mlflow.pyfunc.utils.pyfunc` decorator.
- The class is an ancestor of the `mlflow.pyfunc.PythonModel`.

### Function

---

The following cell defines the `example_model` function, which performs a computation intended to imitate a imtate model. Log it and register it in the `mlflow` registry.

In [2]:
@mlflow.pyfunc.utils.pyfunc
def example_model(model_intput: list[float]) -> list[float]:
    return list(map(lambda x: x**2, model_intput))

with mlflow.start_run():
    mlflow.pyfunc.log_model(
        name="model",
        python_model=example_model,
        registered_model_name="pyfunc_model",
        pip_requirements=[]
    )

  param_names = _check_func_signature(func, "predict")
Successfully registered model 'pyfunc_model'.
Created version '1' of model 'pyfunc_model'.


The following cell shows the code that loads model, and uses the model.

In [3]:
model = mlflow.pyfunc.load_model("models:/pyfunc_model/1")
model.predict([1., 2., 3.])

[1.0, 4.0, 9.0]

The results correspond to the calculation specified in the `example_model` function.

### Class

In class have to be define the `predict` method. Tha is implements the model's computation.

---

The following cell shows the class definition of the class and it's logging to the `mlflow`.

In [3]:
class MyModel(mlflow.pyfunc.PythonModel):
    test = "I'm field you want to acess!!!"
    def predict(self, context, model_input: list[float], params=None):
        return [x * 2 for x in model_input]

with mlflow.start_run():
    mlflow.pyfunc.log_model(
        name="model",
        python_model=MyModel(),
        pip_requirements=["pandas"],
        registered_model_name="pyclass_model"
    )

Successfully registered model 'pyclass_model'.
Created version '1' of model 'pyclass_model'.


The code for loading the model from the registry is the same as it was for the function approach.

In [4]:
model = mlflow.pyfunc.load_model("models:/pyclass_model/1")
model.predict([2., 4.])

[4.0, 8.0]

## Signature

An MLFlow model signature is declares of the model inputs, outputs, and parameters of a model. It documents the model and provides potential consumers with information about it, particullary built in serving point uses it to validate the model's inputs.

The MLFlow schema is optional, and cosists of followin optional in particular elelemts:

- Input schema: description of the single element that can be processed by the model.
- Output schema: description of the output that corresponds to the single element of the model.
- Parameters schema: schema of the parameteres of the model (for example "temperature" of the LLM)

The main elements of the MLFlow API associated with signatures are as following:

| Element | Type | Description |
|---------|------|-------------|
| `ModelSignature` | Class | Represents the full input/output schema of a model. |
| `mlflow.types.Schema` | Class | Collection of column specs describing input/output structure. |
| `mlflow.types.ColSpec` | Class | Describes a single column (type, name, optional). |
| `TensorSpec` | Class | Describes a tensor (dtype, shape, name). |
| `ParamSchema` | Class | Schema for parameter-based inputs (non-data, e.g. hyperparams). |
| `DataType` | Enum | Supported MLflow types: `integer`, `long`, `float`, `double`, `boolean`, `string`, `binary`. |
| `infer_signature` | Function | Infers a `ModelSignature` from sample input/output. |
| `to_dict` / `from_dict` | Method | Serialize/deserialize `ModelSignature` objects. |
| `__eq__` | Method | Compare two signatures for equality (useful in tests). |

There are two types of the signatures:

- Column-based: allocates to name of the feature expected datatype.
- Tensor-based: assumes an array of elements as input, with each value comes in the correct position.

Check the informations in the [Model Signatures and Input Examples](https://mlflow.org/docs/latest/ml/model/signatures/) models.

---

The following cell logs the model with `input_example`, which will be used to generate the model's signature.

In [14]:
import pandas as pd
import mlflow
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=100, n_features=4, noise=0.1)
X = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(X.shape[1])])
X["feature_1"] = X["feature_1"].astype(int)

with mlflow.start_run():
    rf = RandomForestRegressor().fit(X, y)
    mlflow.sklearn.log_model(
        rf,
        name="model",
        registered_model_name="rf_model",
        input_example=X.iloc[:5]
    )

Registered model 'rf_model' already exists. Creating a new version of this model...
Created version '2' of model 'rf_model'.


To obtain the signature of a registered model, retrieve the `signature` attribute from the `ModelInfo` object.

In [17]:
model = mlflow.models.get_model_info("models:/rf_model/1")
model.signature

inputs: 
  ['feature_0': double (required), 'feature_1': long (required), 'feature_2': double (required), 'feature_3': double (required)]
outputs: 
  [double (required)]
params: 
  None

The same information you can get from mlflow interface.

### Manual defition

To define schema manually you have to use `mlflow.models.ModelSignature` class that is initialised by the:

- `input: mlflow.types.Schema`.
- `output: mlfow.types.Schema`.
- `params: mlflow.types.ParamSchema`.

---

The following cell defines the `ModelSignature`. Here:

- `input` is defined using a column-based approach and expects an array of the elements with the corresponding column names.
- `output` uses tensor-based approach where output is a set of 2 elements vectors.
- `params` contains a single parameter that by default is defined with 0.7.

In [34]:
import numpy as np
from mlflow.types import Schema, ColSpec,ParamSchema, ParamSpec, TensorSpec
from mlflow.models import ModelSignature

input_schema = Schema([
    ColSpec("double", "value1"),
    ColSpec("integer", "value2"),
    ColSpec("string", "value3")
])
output_schema = Schema([TensorSpec(np.dtype(np.float32), (-1, 2))])
parameters_schema = ParamSchema([ParamSpec("temperature", "double", default=0.7)])

ModelSignature(inputs=input_schema, outputs=output_schema, params=parameters_schema)

inputs: 
  ['value1': double (required), 'value2': integer (required), 'value3': string (required)]
outputs: 
  [Tensor('float32', (-1, 2))]
params: 
  ['temperature': double (default: 0.7)]