Gen AI

Gen AI#

This page provides the details on the mlflow gen AI module.

import mlflow
import logging
from mlflow.tracking import MlflowClient
from IPython.display import clear_output

import langchain
from langchain_ollama import ChatOllama

logging.basicConfig(level=logging.WARNING)

!rm -f /tmp/llm_mlflow.db
mlflow.set_registry_uri("sqlite:////tmp/llm_mlflow.db")
mlflow.set_tracking_uri("sqlite:////tmp/llm_mlflow.db")
chat = ChatOllama(model="llama3.2:1b", temperature=0)

Trace#

Each invocation to the LLM based pipeline is called “trace”.

Trace cosits of:

Trace info: general information about the trace primarly used for models ordering and selection.
Trace data: detailed information about the pipline run. Consists of spans.

The following cell runs the experiment that produces the traces we will consider later.

mlflow.langchain.autolog()
ans = chat.invoke("Hello how are you?")
clear_output()

With mlflow.search_traces you can get the traces registered in your mlflows as pandas.DataFrame.

traces = mlflow.search_traces()
traces

	trace_id	trace	client_request_id	state	request_time	execution_duration	request	response	trace_metadata	tags	spans	assessments
0	tr-06f1ef26ee7df2eee94b73b1af90c415	{"info": {"trace_id": "tr-06f1ef26ee7df2eee94b...	None	TraceState.OK	1764150102464	3665	[[{'content': 'Hello how are you?', 'additiona...	{'generations': [[{'text': "I'm doing well, th...	{'mlflow.user': 'user', 'mlflow.source.name': ...	{'mlflow.artifactLocation': '/home/user/Docume...	[{'trace_id': 'BvHvJu598u7pS3Oxr5DEFQ==', 'spa...	[]

The following cell loads a specific trace.

trace_obj = mlflow.get_trace(traces["trace_id"].iloc[0])

The following cell shows loads the info of the trace.

trace_obj.info

TraceInfo(trace_id='tr-06f1ef26ee7df2eee94b73b1af90c415', trace_location=TraceLocation(type=<TraceLocationType.MLFLOW_EXPERIMENT: 'MLFLOW_EXPERIMENT'>, mlflow_experiment=MlflowExperimentLocation(experiment_id='0'), inference_table=None), request_time=1764150102464, state=<TraceState.OK: 'OK'>, request_preview='[[{"content": "Hello how are you?", "additional_kwargs": {}, "response_metadata": {}, "type": "human", "name": null, "id": null}]]', response_preview='{"generations": [[{"text": "I\'m doing well, thank you for asking. Is there anything I can help you with or would you like to talk about something in particular?", "generation_info": {"model": "llama3.2:1b", "created_at": "2025-11-26T09:41:46.127363708Z", "done": true, "done_reason": "stop", "total_duration": 3658687704, "load_duration": 1875642829, "prompt_eval_count": 30, "prompt_eval_duration": 394490334, "eval_count": 30, "eval_duration": 1350268885, "model_name": "llama3.2:1b", "model_provider": "ollama"}, "type": "ChatGeneration", "message": {"content": "I\'m doing well, thank you for asking. Is there anything I can help you with or would you like to talk about something in particular?", "additional_kwargs": {}, "response_metadata": {"model": "llama3.2:1b", "created_at": "2025-11-26T09:41:46.127363708Z", "done": true, "done_reason": "stop", "total_duration": 3658687704, "load_duration": 1875642829, "prompt_eval_count": 30, "prompt_eval_duration": 394490334, "eval_count": 30, "ev...', client_request_id=None, execution_duration=3665, trace_metadata={'mlflow.user': 'user', 'mlflow.source.name': '/home/user/.virtualenvironments/python/lib/python3.13/site-packages/ipykernel_launcher.py', 'mlflow.source.type': 'LOCAL', 'mlflow.source.git.commit': 'be083a43d6047491e3b0b41bdc5a34c561d79cb6', 'mlflow.source.git.repoURL': 'git@github.com:fedorkobak/python.git', 'mlflow.source.git.branch': 'main', 'mlflow.trace_schema.version': '3', 'mlflow.traceInputs': '[[{"content": "Hello how are you?", "additional_kwargs": {}, "response_metadata": {}, "type": "human", "name": null, "id": null}]]', 'mlflow.traceOutputs': '{"generations": [[{"text": "I\'m doing well, thank you for asking. Is there anything I can help you with or would you like to talk about something in particular?", "generation_info": {"model": "llama3.2:1b", "created_at": "2025-11-26T09:41:46.12736...', 'mlflow.trace.tokenUsage': '{"input_tokens": 30, "output_tokens": 30, "total_tokens": 60}', 'mlflow.trace.sizeStats': '{"total_size_bytes": 21122, "num_spans": 1, "max": 18572, "p25": 18572, "p50": 18572, "p75": 18572}', 'mlflow.trace.sizeBytes': '21122'}, tags={'mlflow.artifactLocation': '/home/user/Documents/code/python/ds_ml/mlflow/mlruns/0/traces/tr-06f1ef26ee7df2eee94b73b1af90c415/artifacts', 'mlflow.traceName': 'ChatOllama'}, assessments=[])

This cell displays the data from the trace.

trace_obj.data

TraceData(spans=[Span(name='ChatOllama', trace_id='tr-06f1ef26ee7df2eee94b73b1af90c415', span_id='c9eca46a0a240231', parent_id=None)])

Custom tracing#

To build a custom tracing framework you have to specify specify the spans creation logic and which information have to be logged to them:

The mlflow.tracing decorator to make a span each time the function is called.
The mlflow.start_span context manager to track everything that happens inside the span.

Note: Some spans can be nested to the other spans.

Check more in the:

Manual Tracing guide.
mlflow.trace decorator API.

The following cell creates the traced some_tracing function which simply wraps the input in special wrapped text.

ans = mlflow.set_experiment("custom_tracing")

2025/11/26 15:08:42 INFO mlflow.tracking.fluent: Experiment with name 'custom_tracing' does not exist. Creating a new experiment.

The following cell uses context manager syntax to create the span.

with mlflow.start_span() as span:
    span.set_inputs("Input request")
    span.attributes["hello"] = "new_value"
    span.set_outputs("Span outputs")
clear_output()

An alternative approach is to wrap the function with mlflow.trace decorator, so that its invocation is tracked as a span.

@mlflow.trace
def some_tracing(inp: str) -> str:
    return f"<extra information>{inp}<the data>"

some_tracing("hello")
clear_output()

The following cell shows the kind of span that would be obtained from the corresponding trace.

trace_id = mlflow.search_traces()["trace_id"].iloc[0]
trace = mlflow.get_trace(trace_id=trace_id)

span = trace.data.spans[0]
print(f"{span.inputs} -> {span.outputs}")

{'inp': 'hello'} -> <extra information>hello<the data>

Customizing spans#

You can attach the following to each span:

name.
span_type: specified as value from predefined mlflow.entities.SpanType.
attrributes: key-value pairs.

The following cell creates the span as the context, and invokes there some functions that are decorated as spans. Each span have different setup.

from mlflow.entities import SpanType

@mlflow.trace(
    name="my_span",
    span_type=SpanType.LLM,
    attributes={"attr1": "value1"}
)
def llm_span(input: str) -> str:
    return "some value"

@mlflow.trace(span_type=SpanType.RERANKER)
def reranker_span(input: str) -> str:
    return "output of rerunker span"

with mlflow.start_span(span_type="custom span", name="my_cool_span"):
    llm_span("input")
    reranker_span("reranker span input")

Evaluation#

The MLFlow provides the tool for evaluation is the mlflow.getia.evaluate function. You have to provide:

data: dataset that would be used for evaluation.
predict_fn: function that implements the model.
scorers: list of evaluation objects, custom or provided by mlflow.

Check more on evaluation and monitoring.

The following cell performs the evaluation.

from mlflow.genai import scorer
from langchain.messages import AIMessage

mlflow.set_experiment("evaluation_test")

eval_dataset = [
    {
        "inputs": {"question": "What is the Scotland's national animal?"},
        "expectations": {"expected_response": "Unicorn"}
    },
    {
        "inputs": {"question": "Who was the first person to build an airplane?"},
        "expectations": {"expected_response": "Wright Brothers"}
    },
    {
        "inputs": {"question": "Who wrote Romeo and Juliet?"},
        "expectations": {"expected_response": "William Shakespeare"}
    }
]

def predict_fn(question: str) -> AIMessage:
    return chat.invoke(question)

@scorer
def some_check(outputs: AIMessage) -> bool:
    return len(outputs.content) < 10

mlflow.genai.evaluate(
    data=eval_dataset,
    predict_fn=predict_fn,
    scorers=[some_check]
)
clear_output()

After that, you should see the corresponding interface in the MLFlow UI that describes the outputs of the evaluation.

Custom scorer#

Create a custom scorer by wrapping a function with mlflow.genai.scorer decorator.

This function can take any from the following arguments: inputs, expectations, outputs, trace.

For more details check:

“What is scorers?” in MLFlow offcial documentation.
Custom Code-based scorers page of the official documentation.

Consider the details of the custom scorer. The following cell defines and uses the scorer with the most complete input. Scorer prints its inputs to show their formats.

from langchain_core.messages import AIMessage

@mlflow.genai.scorer
def exp_out(
    inputs: dict,
    expectations: dict,
    outputs: str,
    trace: mlflow.entities.Trace
) -> bool:
    print("=" * 80)
    print("Inputs:", inputs)
    print("Expectations:", expectations)
    print("Outputs:", outputs)
    print("Trace:", type(trace))
    print("=" * 80)
    return True

mlflow.set_experiment("evaluation_test")

with mlflow.start_span(name="This trace"):
    mlflow.genai.evaluate(
        data = [
            {
                "inputs": {"inp": "inputs value"},
                "expectations": {"exp": "expectations value"}
            }
        ],
        predict_fn=lambda inp: "ouputs value",
        scorers=[exp_out]
    )

2025/12/02 13:40:56 INFO mlflow.genai.utils.data_validation: Testing model prediction with the first sample in the dataset. To disable this check, set the MLFLOW_GENAI_EVAL_SKIP_TRACE_VALIDATION environment variable to True.

================================================================================
Inputs: {'inp': 'inputs value'}
Expectations: {'exp': 'expectations value'}
Outputs: ouputs value
Trace: <class 'mlflow.entities.trace.Trace'>
================================================================================

✨ Evaluation completed.

Metrics and evaluation results are logged to the MLflow run:
  Run name: omniscient-moose-379
  Run ID: 5df7987647d947609052e69b33012d1c

To view the detailed evaluation results with sample-wise scores,
open the Traces tab in the Run page in the MLflow UI.

Prompts#

MLFlow provides a tool for managing prompts. This includes prompt storage and prompts version control.

Create the prompt, or a new version of it, in the MLFlow UI, or by using mlflow.genai.register_prompt.
To load the prompt from your code, use mlflow.genai.load_prompt.

For more check:

Prompt Registry section of mlflow documentation.
Prompts page.

The following cell creates the prompt.

mlflow.genai.register_prompt(
    name="knowledge_prompt",
    template="This is my prompt",
    commit_message="Initial prompt"
)
clear_output()

To publish a new version of a prompt, simply invoke the mlflow.genai.register_prompt again with the same name. This will register the information as a new version of the prompt.

mlflow.genai.register_prompt(
    name="knowledge_prompt",
    template="Updated version of the prompt"
)

PromptVersion(name=knowledge_prompt, version=2, template=Updated version of the prompt)

The following two cells demonstrate how to invoke the prompt.

mlflow.genai.load_prompt("prompts:/knowledge_prompt/1")

PromptVersion(name=knowledge_prompt, version=1, template=This is my prompt)

prompt = mlflow.genai.load_prompt("prompts:/knowledge_prompt/2")
prompt

PromptVersion(name=knowledge_prompt, version=2, template=Updated version of the prompt)

Use the format method to convert the MLFlow prompt object to a string.

prompt.format()

'Updated version of the prompt'