Promts#

MLflow has its own prompts registry for storing and versioning prompts and associated metadata.

import mlflow
import logging
from mlflow.tracking import MlflowClient
from IPython.display import clear_output

import logging
from langchain_ollama import ChatOllama

logging.basicConfig(level=logging.WARNING)

DATABASE_NAME = "mlflow_prompts.db"

mlflow.set_registry_uri(f"sqlite:////tmp/{DATABASE_NAME}")
mlflow.set_tracking_uri(f"sqlite:////tmp/{DATABASE_NAME}")
chat = ChatOllama(model="llama3.2:1b", temperature=0)

Alias#

An alias is a short name assigned to a specific version of a prompt. It usually reflects the unique role or status of that version. In the code that retrieves the prompt, you only need to reference the alias, so you don’t have to modify the code when switching to a new version for a particular purpose - just assign the alias to the desired version.


The following cell registers the two versions of the prompt that will be used for the experiments.

prompt_name = "alias_prompt"
mlflow.genai.register_prompt(name=prompt_name, template="Prompt1")
mlflow.genai.register_prompt(name=prompt_name, template="Prompt2")
clear_output()

You can use mlflow.genai.set_prompt_alias to assign an alias to the second model version.

mlflow.genai.set_prompt_alias(
    alias="production",
    name=prompt_name,
    version=2
)

The following cell shows the aliases for the second version of the prompt.

mlflow.genai.load_prompt(f"prompts:/{prompt_name}/2").aliases
['production']

It also shows that you can refer to the corresponding version of the prompt by alias.

mlflow.genai.load_prompt(f"prompts:/{prompt_name}@production")
PromptVersion(name=alias_prompt, version=2, template=Prompt2)

Format#

You can specify where something is supposed to be substituted using pattern {{ var_name }}. Use the format method with the substitutions provided as keyword arguments to get a string with a substituted patterns. Some popular framewokrs as langchain or llamaIndex support the subtitution patterns but use single bracket syntax. The prompt object’s to_single_brace_format method can be used to confirm this requirement.


The following cell creates the prompt.

mlflow.genai.register_prompt(
    name="format_prompt",
    template="This is {{ some_pattern }}"
)
clear_output()

And substitutes the infromation.

prompt = mlflow.genai.load_prompt("prompts:/format_prompt/1")
prompt.format(some_pattern="<inserted information>")
'This is <inserted information>'

The example of reducing to the single bracket syntax.

prompt.to_single_brace_format()
'This is {some_pattern}'

Structured output#

You can save the expected format alongside prompt by using the response_format argument. You can provide either a Pydantic model or a JSON schema.


The following cell defines the PyDantic model and saves it with the prompt.

from pydantic import BaseModel

class ExampleModel(BaseModel):
    str_var: str
    int_var: int

mlflow.genai.register_prompt(
    name="strucutred_output",
    template="",
    response_format=ExampleModel
)
clear_output()

You can retrieve the response format in form of JSON-schema by using the response_format attribute of the prompt object.

mlflow.genai.load_prompt("prompts:/strucutred_output/1").response_format
{'properties': {'str_var': {'title': 'Str Var', 'type': 'string'},
  'int_var': {'title': 'Int Var', 'type': 'integer'}},
 'required': ['str_var', 'int_var'],
 'title': 'ExampleModel',
 'type': 'object'}

Registering/serving#

MLFlow provieds tools for registering and serving LLM-based applications that use a workflow similar to classical ML models.

You must inherit inherit mlflow.pyfunc.ResponsesAgent object and implement the predict method. The mlflow.pyfunc.ResponseAgent is just an extension of the mlflow.pyfunc flavour specific for LLM-based applications.


The following cell defines the SimpleResponsesAgent, which simply returns a predifined message.

from mlflow.entities import SpanType
from mlflow.pyfunc import ResponsesAgent
from mlflow.types.responses import ResponsesAgentRequest, ResponsesAgentResponse


class SimpleResponsesAgent(ResponsesAgent):
    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        return ResponsesAgentResponse(
            output=[
                self.create_text_output_item(
                    text="The result of 4 * 3 in Python is 12.",
                    id="msg_1",
                )
            ]
        )

The following cell registers the regular models in the model registry.

simple_responses_agent = SimpleResponsesAgent()
with mlflow.start_run():
    mlflow.pyfunc.log_model(
        python_model=simple_responses_agent,
        name="agent",
        registered_model_name="agent"
    )
2025/11/27 14:12:00 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/11/27 14:12:00 INFO mlflow.store.db.utils: Updating database tables
2025-11-27 14:12:00 INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-27 14:12:00 INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
2025-11-27 14:12:00 INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-27 14:12:00 INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
2025/11/27 14:12:00 INFO mlflow.pyfunc: Predicting on input example to validate output
2025/11/27 14:12:03 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/11/27 14:12:03 INFO mlflow.store.db.utils: Updating database tables
2025-11-27 14:12:03 INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-27 14:12:03 INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
Registered model 'agent' already exists. Creating a new version of this model...
Created version '4' of model 'agent'.

The following cell loads just registered model. And executes it’s predict method.

model = mlflow.pyfunc.load_model("models:/agent@latest")
model.predict(
    {
        "input": [{"role": "user", "content": "what is 4*3 in python"}],
        "context": {"conversation_id": "123", "user_id": "456"},
    }
)
{'object': 'response',
 'output': [{'type': 'message',
   'id': 'msg_1',
   'content': [{'text': 'The result of 4 * 3 in Python is 12.',
     'type': 'output_text'}],
   'role': 'assistant'}]}