Gen AI#

This page provides the details on the mlflow gen AI module.

import mlflow
import logging

from langchain_ollama import ChatOllama

logging.getLogger("alembic").setLevel("WARNING")
logging.getLogger("mlflow").setLevel("WARNING")

!rm -f /tmp/llm_mlflow.db
mlflow.set_registry_uri("sqlite:////tmp/llm_mlflow.db")
mlflow.set_tracking_uri("sqlite:////tmp/llm_mlflow.db")
chat = ChatOllama(model="llama3.2:1b", temperature=0)

Trace#

Each invocation to the LLM based pipeline is called “trace”.

Trace cosits of:

  • Trace info: general information about the trace primarly used for models ordering and selection.

  • Trace data: detailed information about the pipline run. Consists of spans.


The following cell runs the experiment that produces the traces we will consider later.

mlflow.langchain.autolog()
ans = chat.invoke("Hello how are you?")

With mlflow.search_traces you can get the traces registered in your mlflows as pandas.DataFrame.

traces = mlflow.search_traces()
traces
trace_id trace client_request_id state request_time execution_duration request response trace_metadata tags spans assessments
0 tr-06f1ef26ee7df2eee94b73b1af90c415 {"info": {"trace_id": "tr-06f1ef26ee7df2eee94b... None TraceState.OK 1764150102464 3665 [[{'content': 'Hello how are you?', 'additiona... {'generations': [[{'text': "I'm doing well, th... {'mlflow.user': 'user', 'mlflow.source.name': ... {'mlflow.artifactLocation': '/home/user/Docume... [{'trace_id': 'BvHvJu598u7pS3Oxr5DEFQ==', 'spa... []

The following cell loads a specific trace.

trace_obj = mlflow.get_trace(traces["trace_id"].iloc[0])

The following cell shows loads the info of the trace.

trace_obj.info
TraceInfo(trace_id='tr-06f1ef26ee7df2eee94b73b1af90c415', trace_location=TraceLocation(type=<TraceLocationType.MLFLOW_EXPERIMENT: 'MLFLOW_EXPERIMENT'>, mlflow_experiment=MlflowExperimentLocation(experiment_id='0'), inference_table=None), request_time=1764150102464, state=<TraceState.OK: 'OK'>, request_preview='[[{"content": "Hello how are you?", "additional_kwargs": {}, "response_metadata": {}, "type": "human", "name": null, "id": null}]]', response_preview='{"generations": [[{"text": "I\'m doing well, thank you for asking. Is there anything I can help you with or would you like to talk about something in particular?", "generation_info": {"model": "llama3.2:1b", "created_at": "2025-11-26T09:41:46.127363708Z", "done": true, "done_reason": "stop", "total_duration": 3658687704, "load_duration": 1875642829, "prompt_eval_count": 30, "prompt_eval_duration": 394490334, "eval_count": 30, "eval_duration": 1350268885, "model_name": "llama3.2:1b", "model_provider": "ollama"}, "type": "ChatGeneration", "message": {"content": "I\'m doing well, thank you for asking. Is there anything I can help you with or would you like to talk about something in particular?", "additional_kwargs": {}, "response_metadata": {"model": "llama3.2:1b", "created_at": "2025-11-26T09:41:46.127363708Z", "done": true, "done_reason": "stop", "total_duration": 3658687704, "load_duration": 1875642829, "prompt_eval_count": 30, "prompt_eval_duration": 394490334, "eval_count": 30, "ev...', client_request_id=None, execution_duration=3665, trace_metadata={'mlflow.user': 'user', 'mlflow.source.name': '/home/user/.virtualenvironments/python/lib/python3.13/site-packages/ipykernel_launcher.py', 'mlflow.source.type': 'LOCAL', 'mlflow.source.git.commit': 'be083a43d6047491e3b0b41bdc5a34c561d79cb6', 'mlflow.source.git.repoURL': 'git@github.com:fedorkobak/python.git', 'mlflow.source.git.branch': 'main', 'mlflow.trace_schema.version': '3', 'mlflow.traceInputs': '[[{"content": "Hello how are you?", "additional_kwargs": {}, "response_metadata": {}, "type": "human", "name": null, "id": null}]]', 'mlflow.traceOutputs': '{"generations": [[{"text": "I\'m doing well, thank you for asking. Is there anything I can help you with or would you like to talk about something in particular?", "generation_info": {"model": "llama3.2:1b", "created_at": "2025-11-26T09:41:46.12736...', 'mlflow.trace.tokenUsage': '{"input_tokens": 30, "output_tokens": 30, "total_tokens": 60}', 'mlflow.trace.sizeStats': '{"total_size_bytes": 21122, "num_spans": 1, "max": 18572, "p25": 18572, "p50": 18572, "p75": 18572}', 'mlflow.trace.sizeBytes': '21122'}, tags={'mlflow.artifactLocation': '/home/user/Documents/code/python/ds_ml/mlflow/mlruns/0/traces/tr-06f1ef26ee7df2eee94b73b1af90c415/artifacts', 'mlflow.traceName': 'ChatOllama'}, assessments=[])

This cell displays the data from the trace.

trace_obj.data
TraceData(spans=[Span(name='ChatOllama', trace_id='tr-06f1ef26ee7df2eee94b73b1af90c415', span_id='c9eca46a0a240231', parent_id=None)])

Custom tracing#

To build a custom tracing framework you have to specify specify the spans creation logic and which information have to be logged to them:

  • The mlflow.tracing decorator to make a span each time the function is called.

  • The mlflow.start_span context manager to track everything that happens inside the span.

Note: Some spans can be nested to the other spans.

Check more in the:


The following cell creates the traced some_tracing function which simply wraps the input in special wrapped text.

ans = mlflow.set_experiment("custom_tracing")

The following cell uses context manager syntax to create the span.

with mlflow.start_span() as span:
    span.set_inputs("Input request")
    span.set_attribute("hello", "new_value")
    span.set_outputs("Span outputs")

An alternative approach is to wrap the function with mlflow.trace decorator, so that its invocation is tracked as a span.

@mlflow.trace
def some_tracing(inp: str) -> str:
    return f"<extra information>{inp}<the data>"


some_tracing("hello")
'<extra information>hello<the data>'

The following cell shows the kind of span that would be obtained from the corresponding trace.

trace_id = mlflow.search_traces()["trace_id"].iloc[0]
trace = mlflow.get_trace(trace_id=trace_id)

span = trace.data.spans[0]
print(f"{span.inputs} -> {span.outputs}")
{'inp': 'hello'} -> <extra information>hello<the data>

Customizing spans#

You can attach the following to each span:

  • name.

  • span_type: specified as value from predefined mlflow.entities.SpanType.

  • attrributes: key-value pairs.

Read more in Spans page.


The following cell creates the span as the context, and invokes there some functions that are decorated as spans. Each span have different setup.

from mlflow.entities import SpanType


@mlflow.trace(
    name="my_span",
    span_type=SpanType.LLM,
    attributes={"attr1": "value1"}
)
def llm_span(input: str) -> str:
    return "some value"


@mlflow.trace(span_type=SpanType.RERANKER)
def reranker_span(input: str) -> str:
    return "output of rerunker span"


with mlflow.start_span(span_type="custom span", name="my_cool_span"):
    llm_span("input")
    reranker_span("reranker span input")

Span#

Span is a building box for the trace. Each span shows one step of the trace. In context of LLM-based solutions typically, it is typically accociated with a signle message passed to model or returned by the model.

Check official description in the Spans mlflow page.


The following cell creates the span and specifies the input and output information for the span.

from mlflow.entities import SpanType

with mlflow.start_span(
    name="my_span",
    span_type=SpanType.LLM
) as span:
    span.set_inputs("The input")
    span.set_outputs("The output")

Mesages#

The typical output of the LLM-based systems is a sequence of messages. MLFlow uses a special UI representation if the data has been prepared in a format compatible with OpenAI.


The following cell illustrates the correct format for publishing the sequence in the mlflow span.

with mlflow.start_span(
    name="my_span",
    span_type=SpanType.LLM
) as span:
    span.set_inputs([
        {"role": "system", "content": "System prompt"},
        {"role": "user", "content": "Hello how are you"},
    ])
    span.set_outputs([
        {"role": "assistant", "content": "Answer of the model"},
        {"role": "tool", "content": "The tool call"}
    ])

Now check the UI of the MLFlow server. Each message will be presented according to its type.

Nested#

Some spans can be nested within other spans.


The following code shows how an inner_span can be placed inside an outer_span.

with mlflow.start_span(
    name="outer_span",
) as outer_span:
    outer_span.set_inputs("data of outer span")
    with mlflow.start_span(
        name="innder_span"
    ) as inner_span:
        inner_span.set_inputs("data of inner span")

Only the outer span would be represented in the list of traces. Go to the Details & Timeline section of the outer span find more about the inner span.

Evaluation#

The MLFlow provides the tool for evaluation is the mlflow.getia.evaluate function. You have to provide:

  • data: dataset that would be used for evaluation.

  • predict_fn: function that implements the model.

  • scorers: list of evaluation objects, custom or provided by mlflow.

Check more on evaluation and monitoring.


The following cell performs the evaluation.

from mlflow.genai import scorer
from langchain.messages import AIMessage

mlflow.set_experiment("evaluation_test")

eval_dataset = [
    {
        "inputs": {"question": "What is the Scotland's national animal?"},
        "expectations": {"expected_response": "Unicorn"}
    },
    {
        "inputs": {"question": "Who was the first person to build an airplane?"},
        "expectations": {"expected_response": "Wright Brothers"}
    },
    {
        "inputs": {"question": "Who wrote Romeo and Juliet?"},
        "expectations": {"expected_response": "William Shakespeare"}
    }
]


def predict_fn(question: str) -> AIMessage:
    return chat.invoke(question)


@scorer
def some_check(outputs: AIMessage) -> bool:
    return len(outputs.content) < 10


mlflow.genai.evaluate(
    data=eval_dataset,
    predict_fn=predict_fn,
    scorers=[some_check]
)

After that, you should see the corresponding interface in the MLFlow UI that describes the outputs of the evaluation.

Custom scorer#

Create a custom scorer by wrapping a function with mlflow.genai.scorer decorator.

This function can take any from the following arguments: inputs, expectations, outputs, trace.

For more details check:


Consider the details of the custom scorer. The following cell defines and uses the scorer with the most complete input. Scorer prints its inputs to show their formats.

from langchain_core.messages import AIMessage


@mlflow.genai.scorer
def exp_out(
    inputs: dict,
    expectations: dict,
    outputs: str,
    trace: mlflow.entities.Trace
) -> bool:
    print("=" * 80)
    print("Inputs:", inputs)
    print("Expectations:", expectations)
    print("Outputs:", outputs)
    print("Trace:", type(trace))
    print("=" * 80)
    return True


mlflow.set_experiment("evaluation_test")


with mlflow.start_span(name="This trace"):
    mlflow.genai.evaluate(
        data=[
            {
                "inputs": {"inp": "inputs value"},
                "expectations": {"exp": "expectations value"}
            }
        ],
        predict_fn=lambda inp: "ouputs value",
        scorers=[exp_out]
    )
2025/12/02 13:40:56 INFO mlflow.genai.utils.data_validation: Testing model prediction with the first sample in the dataset. To disable this check, set the MLFLOW_GENAI_EVAL_SKIP_TRACE_VALIDATION environment variable to True.
================================================================================
Inputs: {'inp': 'inputs value'}
Expectations: {'exp': 'expectations value'}
Outputs: ouputs value
Trace: <class 'mlflow.entities.trace.Trace'>
================================================================================

✨ Evaluation completed.

Metrics and evaluation results are logged to the MLflow run:
  Run name: omniscient-moose-379
  Run ID: 5df7987647d947609052e69b33012d1c

To view the detailed evaluation results with sample-wise scores,
open the Traces tab in the Run page in the MLflow UI.

Prompts#

MLFlow provides a tool for managing prompts. This includes prompt storage and prompts version control.

  • Create the prompt, or a new version of it, in the MLFlow UI, or by using mlflow.genai.register_prompt.

  • To load the prompt from your code, use mlflow.genai.load_prompt.

For more check:


The following cell creates the prompt.

mlflow.genai.register_prompt(
    name="knowledge_prompt",
    template="This is my prompt",
    commit_message="Initial prompt"
)

To publish a new version of a prompt, simply invoke the mlflow.genai.register_prompt again with the same name. This will register the information as a new version of the prompt.

mlflow.genai.register_prompt(
    name="knowledge_prompt",
    template="Updated version of the prompt"
)
PromptVersion(name=knowledge_prompt, version=2, template=Updated version of the prompt)

The following two cells demonstrate how to invoke the prompt.

mlflow.genai.load_prompt("prompts:/knowledge_prompt/1")
PromptVersion(name=knowledge_prompt, version=1, template=This is my prompt)
prompt = mlflow.genai.load_prompt("prompts:/knowledge_prompt/2")
prompt
PromptVersion(name=knowledge_prompt, version=2, template=Updated version of the prompt)

Use the format method to convert the MLFlow prompt object to a string.

prompt.format()
'Updated version of the prompt'

Registering/serving#

MLFlow provieds tools for registering and serving LLM-based applications that use a workflow similar to classical ML models.

You must inherit inherit mlflow.pyfunc.ResponsesAgent object and implement the predict method. The mlflow.pyfunc.ResponseAgent is just an extension of the mlflow.pyfunc flavour specific for LLM-based applications.

Check official description in the ResponsesAgent for Model Serving page.


The following cell defines the SimpleResponsesAgent, which simply returns a predifined message.

%%writefile /tmp/some_code.py
from mlflow.models import set_model
from mlflow.entities import SpanType
from mlflow.pyfunc import ResponsesAgent
from mlflow.types.responses import ResponsesAgentRequest, ResponsesAgentResponse


class SimpleResponsesAgent(ResponsesAgent):
    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        return ResponsesAgentResponse(
            output=[
                self.create_text_output_item(
                    text="The result of 4 * 3 in Python is 12.",
                    id="msg_1",
                )
            ]
        )

set_model(SimpleResponsesAgent())
Overwriting /tmp/some_code.py

The following cell registers the regular models in the model registry.

with mlflow.start_run():
    mlflow.pyfunc.log_model(
        python_model="/tmp/some_code.py",
        name="agent",
        registered_model_name="agent"
    )
Registered model 'agent' already exists. Creating a new version of this model...
Created version '2' of model 'agent'.

The following cell loads just registered model. And executes it’s predict method.

model = mlflow.pyfunc.load_model("models:/agent@latest")
model.predict(
    {
        "input": [{"role": "user", "content": "what is 4*3 in python"}],
        "context": {"conversation_id": "123", "user_id": "456"},
    }
)
{'object': 'response',
 'output': [{'type': 'message',
   'id': 'msg_1',
   'content': [{'text': 'The result of 4 * 3 in Python is 12.',
     'type': 'output_text'}],
   'role': 'assistant'}]}