LLMs#
This page considers LangChain interfaces for LLMs.
import os
from langchain_ollama import ChatOllama
os.environ["HF_HUB_DISABLE_PROGRESS_BARS"] = "1"
model = ChatOllama(model="llama3.1", temperature=0)
Structured ouput#
Some providers support the structured output. The model will return the data in the specified format.
To specify the model to follow the specified format, use the with_strucutred_ouput method. It returns the modified chat object that will follow specified rules.
Check if the provider supports structured ouput in the JSON mode column of the provided features section.
The following cell illustrates how the the user characteristics are extracted from the given text.
from pydantic import BaseModel
class OutputSchema(BaseModel):
id: str
name: str
structured_model = model.with_structured_output(OutputSchema)
response = structured_model.invoke(
"Extract data: 'User llm_lover with id 777 tries to acess the database.'"
)
response
OutputSchema(id='777', name='llm_lover')
Format mismatch#
In some cases, the format specified by the schema can not be sutisfied. In such cases, the model returns the value it has generated without any errors. However, the errors will be published in the validation state if they are provided. For example, it would be in the case the format were specified with the Pydantic model.
Consider an example in which the moodel cannot satisfy the output format due to not enough number of output tokens. The following cell creates a model that can generate only 3 output tokens.
model = ChatOllama(
model="llama3.1",
num_predict=3,
temperature=0
)
Consider the example where the model is restricted to return just "this is a long literal" - 3 tokens is unsufficient for this kind of output.
structured_model = model.with_structured_output({
"type": "string",
"const": "this is a long literal"
})
structured_model.invoke("Print anything you can")
'this is'
The model has just returned an incomplete literal, as specified in the format.
But consider a really close case where the long literal is wrapped by the pydantic model.
import typing
from pydantic import BaseModel
class OutputScema(BaseModel):
val: typing.Literal["this is a long output"]
try:
model.with_structured_output(OutputScema).invoke("Print anything you can")
except Exception as e:
print(type(e))
print(e)
<class 'langchain_core.exceptions.OutputParserException'>
Failed to parse OutputScema from completion {}. Got: 1 validation error for OutputScema
val
Field required [type=missing, input_value={}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.11/v/missing
For troubleshooting, visit: https://docs.langchain.com/oss/python/langchain/errors/OUTPUT_PARSING_FAILURE
In this case the OutputParserException occurs.
Tokens#
There is a set of methods in the langchain_core.language_models.base.BaseLanguageModel that allows to estimate the number of tokens that a piece of text will take:
get_token_ids: returns the indeces of tokens.get_num_tokens: returns the number of tokens for given text.get_num_tokens_from_messages: returns the number of tokens for given list of messages.
Note. The methods use a special default tokeniser, so the particular model will actually use different tokenizer.
The following cell defines two models wrapped with ollama.
deepseek = ChatOllama(model="deepseek-r1:1.5b")
llama = ChatOllama(model="llama3")
test_text = "This is some tricky text: olala"
The output of the get_tokens_ids method for different models.
print(deepseek.get_token_ids(test_text))
print(llama.get_token_ids(test_text))
[1212, 318, 617, 17198, 2420, 25, 25776, 6081]
[1212, 318, 617, 17198, 2420, 25, 25776, 6081]
The outputs are the same because the default tokenizer was used, despite the models use different tokenizers.
The following cell show the output of the get_num_tokens method.
deepseek.get_num_tokens(test_text)
8
And the application of the get_num_tokens_from_messages method.
from langchain_core import messages
deepseek.get_num_tokens_from_messages(
[messages.HumanMessage(test_text)]
)
10
The output is higher because the number includes the service tokens that wrap the messages.
Num predict#
The num_predict parameter sets how much tokens the output of the model can have.
The following cell initialises the chat model object with num_predcit=5.
model = ChatOllama(
model="llama3.1",
temperature=0,
num_predict=5
)
ans = model.invoke("How to find a true Love?")
ans.content, ans.usage_metadata["output_tokens"]
('Finding true love can be', 5)
As the result the output of the model contains exactly 5 tokens, and looks like model wasn’t able to finish the answer.
LLMResult#
The LLMResult is an abstraction of LangChain, which is ususally hidden under the hood. However, you sometimes need to interact with it, for example, to modify invocation calls.
You can obtain an LLMResult from the LLM by using the generate method.
The following cell invokes the corresponding method.
from langchain_core.messages import HumanMessage
llm_result = model.generate(messages=[[HumanMessage(content="Hello")]])
llm_result
LLMResult(generations=[[ChatGeneration(text='Hello! How can I assist you today?', generation_info={'model': 'llama3.1', 'created_at': '2026-01-19T19:56:24.281124679Z', 'done': True, 'done_reason': 'stop', 'total_duration': 2962383221, 'load_duration': 96997024, 'prompt_eval_count': 11, 'prompt_eval_duration': 352428668, 'eval_count': 10, 'eval_duration': 2512357167, 'logprobs': None, 'model_name': 'llama3.1', 'model_provider': 'ollama'}, message=AIMessage(content='Hello! How can I assist you today?', additional_kwargs={}, response_metadata={'model': 'llama3.1', 'created_at': '2026-01-19T19:56:24.281124679Z', 'done': True, 'done_reason': 'stop', 'total_duration': 2962383221, 'load_duration': 96997024, 'prompt_eval_count': 11, 'prompt_eval_duration': 352428668, 'eval_count': 10, 'eval_duration': 2512357167, 'logprobs': None, 'model_name': 'llama3.1', 'model_provider': 'ollama'}, id='lc_run--019bd7d4-6bc4-7b30-91c5-0343016c42de-0', usage_metadata={'input_tokens': 11, 'output_tokens': 10, 'total_tokens': 21}))]], llm_output={}, run=[RunInfo(run_id=UUID('019bd7d4-6bc4-7b30-91c5-0343016c42de'))], type='LLMResult')
type(llm_result)
langchain_core.outputs.llm_result.LLMResult
The generations attribute contains the generated information.
llm_result.generations[0][0]
ChatGeneration(text='Hello! How can I assist you today?', generation_info={'model': 'llama3.1', 'created_at': '2026-01-19T19:56:24.281124679Z', 'done': True, 'done_reason': 'stop', 'total_duration': 2962383221, 'load_duration': 96997024, 'prompt_eval_count': 11, 'prompt_eval_duration': 352428668, 'eval_count': 10, 'eval_duration': 2512357167, 'logprobs': None, 'model_name': 'llama3.1', 'model_provider': 'ollama'}, message=AIMessage(content='Hello! How can I assist you today?', additional_kwargs={}, response_metadata={'model': 'llama3.1', 'created_at': '2026-01-19T19:56:24.281124679Z', 'done': True, 'done_reason': 'stop', 'total_duration': 2962383221, 'load_duration': 96997024, 'prompt_eval_count': 11, 'prompt_eval_duration': 352428668, 'eval_count': 10, 'eval_duration': 2512357167, 'logprobs': None, 'model_name': 'llama3.1', 'model_provider': 'ollama'}, id='lc_run--019bd7d4-6bc4-7b30-91c5-0343016c42de-0', usage_metadata={'input_tokens': 11, 'output_tokens': 10, 'total_tokens': 21}))