Gen AI#

This page provides the details on the mlflow gen AI module.

import mlflow
import logging
from mlflow.tracking import MlflowClient
from IPython.display import clear_output

import langchain
from langchain_ollama import ChatOllama

logging.basicConfig(level=logging.WARNING)

!rm -f /tmp/llm_mlflow.db
mlflow.set_registry_uri("sqlite:////tmp/llm_mlflow.db")
mlflow.set_tracking_uri("sqlite:////tmp/llm_mlflow.db")
chat = ChatOllama(model="llama3.2:1b", temperature=0)

Trace#

Each invocation to the LLM based pipeline is called “trace”.

Trace cosits of:

  • Trace info: general information about the trace primarly used for models ordering and selection.

  • Trace data: detailed information about the pipline run. Consists of spans.


The following cell runs the experiment that produces the traces we will consider later.

mlflow.langchain.autolog()
ans = chat.invoke("Hello how are you?")
clear_output()

With mlflow.search_traces you can get the traces registered in your mlflows as pandas.DataFrame.

traces = mlflow.search_traces()
traces
trace_id trace client_request_id state request_time execution_duration request response trace_metadata tags spans assessments
0 tr-06f1ef26ee7df2eee94b73b1af90c415 {"info": {"trace_id": "tr-06f1ef26ee7df2eee94b... None TraceState.OK 1764150102464 3665 [[{'content': 'Hello how are you?', 'additiona... {'generations': [[{'text': "I'm doing well, th... {'mlflow.user': 'user', 'mlflow.source.name': ... {'mlflow.artifactLocation': '/home/user/Docume... [{'trace_id': 'BvHvJu598u7pS3Oxr5DEFQ==', 'spa... []

The following cell loads a specific trace.

trace_obj = mlflow.get_trace(traces["trace_id"].iloc[0])

The following cell shows loads the info of the trace.

trace_obj.info
TraceInfo(trace_id='tr-06f1ef26ee7df2eee94b73b1af90c415', trace_location=TraceLocation(type=<TraceLocationType.MLFLOW_EXPERIMENT: 'MLFLOW_EXPERIMENT'>, mlflow_experiment=MlflowExperimentLocation(experiment_id='0'), inference_table=None), request_time=1764150102464, state=<TraceState.OK: 'OK'>, request_preview='[[{"content": "Hello how are you?", "additional_kwargs": {}, "response_metadata": {}, "type": "human", "name": null, "id": null}]]', response_preview='{"generations": [[{"text": "I\'m doing well, thank you for asking. Is there anything I can help you with or would you like to talk about something in particular?", "generation_info": {"model": "llama3.2:1b", "created_at": "2025-11-26T09:41:46.127363708Z", "done": true, "done_reason": "stop", "total_duration": 3658687704, "load_duration": 1875642829, "prompt_eval_count": 30, "prompt_eval_duration": 394490334, "eval_count": 30, "eval_duration": 1350268885, "model_name": "llama3.2:1b", "model_provider": "ollama"}, "type": "ChatGeneration", "message": {"content": "I\'m doing well, thank you for asking. Is there anything I can help you with or would you like to talk about something in particular?", "additional_kwargs": {}, "response_metadata": {"model": "llama3.2:1b", "created_at": "2025-11-26T09:41:46.127363708Z", "done": true, "done_reason": "stop", "total_duration": 3658687704, "load_duration": 1875642829, "prompt_eval_count": 30, "prompt_eval_duration": 394490334, "eval_count": 30, "ev...', client_request_id=None, execution_duration=3665, trace_metadata={'mlflow.user': 'user', 'mlflow.source.name': '/home/user/.virtualenvironments/python/lib/python3.13/site-packages/ipykernel_launcher.py', 'mlflow.source.type': 'LOCAL', 'mlflow.source.git.commit': 'be083a43d6047491e3b0b41bdc5a34c561d79cb6', 'mlflow.source.git.repoURL': 'git@github.com:fedorkobak/python.git', 'mlflow.source.git.branch': 'main', 'mlflow.trace_schema.version': '3', 'mlflow.traceInputs': '[[{"content": "Hello how are you?", "additional_kwargs": {}, "response_metadata": {}, "type": "human", "name": null, "id": null}]]', 'mlflow.traceOutputs': '{"generations": [[{"text": "I\'m doing well, thank you for asking. Is there anything I can help you with or would you like to talk about something in particular?", "generation_info": {"model": "llama3.2:1b", "created_at": "2025-11-26T09:41:46.12736...', 'mlflow.trace.tokenUsage': '{"input_tokens": 30, "output_tokens": 30, "total_tokens": 60}', 'mlflow.trace.sizeStats': '{"total_size_bytes": 21122, "num_spans": 1, "max": 18572, "p25": 18572, "p50": 18572, "p75": 18572}', 'mlflow.trace.sizeBytes': '21122'}, tags={'mlflow.artifactLocation': '/home/user/Documents/code/python/ds_ml/mlflow/mlruns/0/traces/tr-06f1ef26ee7df2eee94b73b1af90c415/artifacts', 'mlflow.traceName': 'ChatOllama'}, assessments=[])

This cell displays the data from the trace.

trace_obj.data
TraceData(spans=[Span(name='ChatOllama', trace_id='tr-06f1ef26ee7df2eee94b73b1af90c415', span_id='c9eca46a0a240231', parent_id=None)])

Custom tracing#

To build a custom tracing framework you have to specify specify the spans creation logic and which information have to be logged to them:

  • The mlflow.tracing decorator to make a span each time the function is called.

  • The mlflow.start_span context manager to track everything that happens inside the span.

Note: Some spans can be nested to the other spans.

Check more in the:


The following cell creates the traced some_tracing function which simply wraps the input in special wrapped text.

ans = mlflow.set_experiment("custom_tracing")
2025/11/26 15:08:42 INFO mlflow.tracking.fluent: Experiment with name 'custom_tracing' does not exist. Creating a new experiment.

The following cell uses context manager syntax to create the span.

with mlflow.start_span() as span:
    span.set_inputs("Input request")
    span.attributes["hello"] = "new_value"
    span.set_outputs("Span outputs")
clear_output()

An alternative approach is to wrap the function with mlflow.trace decorator, so that its invocation is tracked as a span.

@mlflow.trace
def some_tracing(inp: str) -> str:
    return f"<extra information>{inp}<the data>"

some_tracing("hello")
clear_output()

The following cell shows the kind of span that would be obtained from the corresponding trace.

trace_id = mlflow.search_traces()["trace_id"].iloc[0]
trace = mlflow.get_trace(trace_id=trace_id)

span = trace.data.spans[0]
print(f"{span.inputs} -> {span.outputs}")
{'inp': 'hello'} -> <extra information>hello<the data>

Customizing spans#

You can attach the following to each span:

  • name.

  • span_type: specified as value from predefined mlflow.entities.SpanType.

  • attrributes: key-value pairs.


The following cell creates the span as the context, and invokes there some functions that are decorated as spans. Each span have different setup.

from mlflow.entities import SpanType

@mlflow.trace(
    name="my_span",
    span_type=SpanType.LLM,
    attributes={"attr1": "value1"}
)
def llm_span(input: str) -> str:
    return "some value"

@mlflow.trace(span_type=SpanType.RERANKER)
def reranker_span(input: str) -> str:
    return "output of rerunker span"

with mlflow.start_span(span_type="custom span", name="my_cool_span"):
    llm_span("input")
    reranker_span("reranker span input")

Evaluation#

The MLFlow provides the tool for evaluation is the mlflow.getia.evaluate function. You have to provide:

  • data: dataset that would be used for evaluation.

  • predict_fn: function that implements the model.

  • scorers: list of evaluation objects, custom or provided by mlflow.

Check more on evaluation and monitoring.


The following cell performs the evaluation.

from mlflow.genai import scorer
from langchain.messages import AIMessage

mlflow.set_experiment("evaluation_test")

eval_dataset = [
    {
        "inputs": {"question": "What is the Scotland's national animal?"},
        "expectations": {"expected_response": "Unicorn"}
    },
    {
        "inputs": {"question": "Who was the first person to build an airplane?"},
        "expectations": {"expected_response": "Wright Brothers"}
    },
    {
        "inputs": {"question": "Who wrote Romeo and Juliet?"},
        "expectations": {"expected_response": "William Shakespeare"}
    }
]

def predict_fn(question: str) -> AIMessage:
    return chat.invoke(question)

@scorer
def some_check(outputs: AIMessage) -> bool:
    return len(outputs.content) < 10

mlflow.genai.evaluate(
    data=eval_dataset,
    predict_fn=predict_fn,
    scorers=[some_check]
)
clear_output()

After that, you should see the corresponding interface in the MLFlow UI that describes the outputs of the evaluation.

Custom scorer#

Create a custom scorer by wrapping a function with mlflow.genai.scorer decorator.

This function can take any from the following arguments: inputs, expectations, outputs, trace.

For more details check:


Consider the details of the custom scorer. The following cell defines and uses the scorer with the most complete input. Scorer prints its inputs to show their formats.

from langchain_core.messages import AIMessage

@mlflow.genai.scorer
def exp_out(
    inputs: dict,
    expectations: dict,
    outputs: str,
    trace: mlflow.entities.Trace
) -> bool:
    print("=" * 80)
    print("Inputs:", inputs)
    print("Expectations:", expectations)
    print("Outputs:", outputs)
    print("Trace:", type(trace))
    print("=" * 80)
    return True

mlflow.set_experiment("evaluation_test")

with mlflow.start_span(name="This trace"):
    mlflow.genai.evaluate(
        data = [
            {
                "inputs": {"inp": "inputs value"},
                "expectations": {"exp": "expectations value"}
            }
        ],
        predict_fn=lambda inp: "ouputs value",
        scorers=[exp_out]
    )
2025/12/02 13:40:56 INFO mlflow.genai.utils.data_validation: Testing model prediction with the first sample in the dataset. To disable this check, set the MLFLOW_GENAI_EVAL_SKIP_TRACE_VALIDATION environment variable to True.
================================================================================
Inputs: {'inp': 'inputs value'}
Expectations: {'exp': 'expectations value'}
Outputs: ouputs value
Trace: <class 'mlflow.entities.trace.Trace'>
================================================================================

✨ Evaluation completed.

Metrics and evaluation results are logged to the MLflow run:
  Run name: omniscient-moose-379
  Run ID: 5df7987647d947609052e69b33012d1c

To view the detailed evaluation results with sample-wise scores,
open the Traces tab in the Run page in the MLflow UI.

Prompts#

MLFlow provides a tool for managing prompts. This includes prompt storage and prompts version control.

  • Create the prompt, or a new version of it, in the MLFlow UI, or by using mlflow.genai.register_prompt.

  • To load the prompt from your code, use mlflow.genai.load_prompt.

For more check:


The following cell creates the prompt.

mlflow.genai.register_prompt(
    name="knowledge_prompt",
    template="This is my prompt",
    commit_message="Initial prompt"
)
clear_output()

To publish a new version of a prompt, simply invoke the mlflow.genai.register_prompt again with the same name. This will register the information as a new version of the prompt.

mlflow.genai.register_prompt(
    name="knowledge_prompt",
    template="Updated version of the prompt"
)
PromptVersion(name=knowledge_prompt, version=2, template=Updated version of the prompt)

The following two cells demonstrate how to invoke the prompt.

mlflow.genai.load_prompt("prompts:/knowledge_prompt/1")
PromptVersion(name=knowledge_prompt, version=1, template=This is my prompt)
prompt = mlflow.genai.load_prompt("prompts:/knowledge_prompt/2")
prompt
PromptVersion(name=knowledge_prompt, version=2, template=Updated version of the prompt)

Use the format method to convert the MLFlow prompt object to a string.

prompt.format()
'Updated version of the prompt'