IMDB example#
This page illustrates the RAG system building process. In more details:
It uses reviews from the IMDb dataset reviews as a knowledge base.
sentence_transformer
package to prepare embeddings.qadrant
as a vector database.qwen2
as a generation model.
In the end, the final RAG system wasn’t very useful, but it can guide you through the major steps of building your own RAG system.
import os
import uuid
import numpy as np
import ollama
from pprint import pprint
from transformers import pipeline
from datasets import load_dataset
from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer
Building embeddings#
The following cell loads a dataset and shows one record from it.
data = list(load_dataset("stanfordnlp/imdb", split="train")['text'])
pprint(data[20])
('If the crew behind "Zombie Chronicles" ever read this, here\'s some advice '
'guys: <br /><br />1. In a "Twist Ending"-type movie, it\'s not a good idea '
'to insert close-ups of EVERY DEATH IN THE MOVIE in the opening credits. That '
"tends to spoil the twists, y'know...? <br /><br />2. I know you produced "
'this on a shoestring and - to be fair - you worked miracles with your budget '
'but please, hire people who can actually act. Or at least, walk, talk and '
"gesture at the same time. Joe Haggerty, I'm looking at you...<br /><br />3. "
"If you're going to set a part of your movie in the past, only do this if you "
'have the props and costumes of the time.<br /><br />4. Twist endings are '
"supposed to be a surprise. Sure, we don't want twists that make no sense, "
'but signposting the "reveal" as soon as you introduce a character? That\'s '
'not a great idea.<br /><br />Kudos to the guys for trying, but in all '
"honesty, I'd rather they hadn't...<br /><br />Only for zombie completists.")
The texts are generally short and contain more or less consistent ideas. Therefore, we will not apply some extra chunking; we will simply transform the complete texts into embeddings.
embedding_model = SentenceTransformer(
"paraphrase-MiniLM-L3-v2",
model_kwargs={'dtype': 'float16'}
)
The following code uses the cache if it exists; otherwise, it re-encodes the text, this can take some time to perform.
if os.path.exists("imdb_example_files/embeddings.npy"):
embeddings = np.load("imdb_example_files/embeddings.npy")
else:
embeddings = embedding_model.encode(data, normalize_embeddings=True)
if not os.path.exists("imdb_example_files"):
os.mkdir("imdb_example_files")
np.save("imdb_example_files/embeddings", embeddings)
with open("imdb_example_files/.gitignore", "w") as f:
f.write("embeddings.npy\n")
Vector database#
This section is demonstrates the process of setting up the vector database and uploading information to it.
client = QdrantClient(":memory:")
embedding_size = embeddings.shape[1]
client.create_collection(
collection_name="imdb",
on_disk_payload=True,
vectors_config=models.VectorParams(
size=embedding_size,
distance=models.Distance.COSINE,
on_disk=True
)
)
True
The next cell converts the information from the raw embeddings into a format suitable for the input expected by the quadrant.
points = [
models.PointStruct(
id=str(uuid.uuid4()),
vector=embeddings[i],
payload={"text": data[i]}
)
for i in range(len(embeddings))
]
client.upsert(collection_name="imdb", points=points)
/tmp/ipykernel_411806/985311365.py:9: UserWarning: Local mode is not recommended for collections with more than 20,000 points. Current collection contains 25000 points. Consider using Qdrant in Docker or Qdrant Cloud for better performance with large datasets.
client.upsert(collection_name="imdb", points=points)
UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)
The next cell defines the load_relevant_reviews
function, which loads information from the quadrant and prepares it as a list of strings.
def load_relevant_reviews(query: str) -> list[str]:
embedding = embedding_model.encode(
[query], normalize_embeddings=True
)
relevant_info = client.query_points(
collection_name="imdb",
query=embedding[0],
limit=5,
with_payload=True
)
return [res.payload['text'] for res in list(relevant_info)[0][1]]
The following code illustrates which texts are loaded from the database for a specific request.
ans = load_relevant_reviews("What is the typcal plot for a horror movie?")
for res in ans:
pprint(res)
print('\n')
("How can you tell that a horror movie is terrible? when you can't stop "
'laughing about it of course! The plot has been well covered by other '
"reviewers, so I'll just add a few things on the hilarity of it all.<br /><br "
'/>Some reviews have placed the location in South America, others in Africa, '
'I thought it was in some random island in the Pacific. Where exactly does '
'this take place, seems to be a mystery. The cannibal tribe is conformed by a '
'couple of black women some black men, and a man who looks like a young Frank '
'Zappa banging the drums... the Devil God is a large black man with a '
'terrible case of pink eyes.<br /><br />One of the "freakiest" moments in the '
'film is when, "Pablito" find his partner hanging from a tree covered in what '
'seems to be an orange substance that I assume is blood, starts screaming for '
"minutes on and on (that's actually funny), and then the head of his partner "
'falls in the ground and "Pablito" kicks it a bit for what I assume is "shits '
'n\' giggles" and the eyes actually move...<br /><br />But, of course, then '
'the "freak" is gone when you realize the eyes moved because the movie is '
"just bad...<br /><br />I hadn't laughed like this in a loooong while, and I "
'definitely recommend this film for a Sunday afternoon with your friends and '
'you have nothing to do... grab a case of beers and start watching this film, '
"you'll love it! If you are looking for a real horror or gore movie, "
"though... don't' bother.")
('This movie still chills me to the bone thinking of it. This movie was not '
'just bad as in low-budget, badly acted, etc. although it certainly WAS all '
'of those things. The problem with this movie is that it seemed to be '
'intentionally trying to annoy the viewer, and doing it with great success. '
"What I want to know is, is this supposed to be a horror movie? I mean, it's "
'definately horrifying, but not in the way horror movies are supposed to be. '
'I could see the first segment trying to be horror and failing, but what the '
"hell is the second segment? It's just annoying. The third segment is like "
'watching an artsy student film, which amazingly enough makes it the least '
"painful segment. It's an atrocity that this movie isn't way low on the "
'bottom 100, so get your votes (1/10) in people!! I know some people gave '
"this good reviews, but, well, they're lying in a sadistic attempt to trick "
'you. Trust me, it is impossible to like this movie. The only benefit of this '
"movie is an amazing life-extending effect: it feels like you've been "
'watching this movie for years after only the first half hour has passed.')
('Chilling, majestic piece of cinematic fright, this film combines all the '
'great elements of an intellectual thriller, with the grand vision of a '
'director who has the instinctual capacity to pace a moody horror flick '
'within the realm of his filmmaking genius that includes an eye for the '
'original shot, an ice-cold soundtrack and an overall sense of '
'dehumanization. This movie cuts through all the typical horror movies like a '
'red-poker through a human eye, as it allows the viewer to not only feel the '
'violence and psychosis of its protagonist, but appreciate the seed from '
'which the derangement stems. One of the scariest things for people to face '
'is the unknown and this film presents its plotting with just that thought in '
'mind. The setting is perfect, in a desolate winter hideaway. The quietness '
'of the moment is a character in itself, as the fermenting aggressor in Jack '
"Torrance's mind wallows in this idle time, and breeds the devil's new "
'playground. I always felt like the presence of evil was dormant in all of '
'our minds, with only the circumstances of the moment, and the reasons given '
'therein, needed to wake its violent ass and pounce over its unsuspecting '
'victims. This film is a perfect example of this very thought.<br /><br />And '
"it is within this film's subtle touches of the canvas, the clackity-clacks "
"of the young boy's big wheel riding along the empty hallways of the hotel, "
"the labyrinthian garden representing the mind's fine line between sane and "
"insane, Kubrick's purposely transfixed editing inconsistencies, continuity "
'errors and set mis-arrangements, that we discover a world guided by the '
'righteous and tangible, but coaxed away by the powerful and unknown. I have '
'never read the book upon which the film is based, but without that as a '
'comparison point, I am proud to say that this is one of the most terrifying '
"films that I have ever seen. I thought that the runtime of the film could've "
'been cut by a little bit, but then again, I am not one of the most acclaimed '
'directors in the history of film, so maybe I should keep my two-cent '
'criticisms over a superb film, to myself. All in all, this movie captures '
'your attention with its grand form and vision, ropes you in with some terror '
'and eccentric direction, and ties you down and stabs you in the heart with '
"its cold-eyed view of the man's mind gone overboard, creepy atmosphere and "
'the loss of humanity.<br /><br />Rating: 9/10')
('I am an avid B-Rate horror film buff and have viewed my fair share of '
'slasher pictures, so I have a substantial gauge to judge this film by. It '
"easily ranks in the upper echelon of the worst horror films the 1980's has "
"to offer. It isn't as scary as Night of the Demons, it isn't as gory as "
"Re-Animator and lacks the camp value of There's Nothing Out There. That "
'being said, this film has no value. Keep in mind, the movie artwork is for a '
'completely different film. The stills shots on the back of the DVD box '
"aren't taken from this film.<br /><br />VIOLENCE: $$$ (There is plenty of "
"violence but we've seen it all before. A murderer kills nubile students and "
'the occasional facility member by slitting throats and all the other tired '
'methods of murder that horror films utilize).<br /><br />NUDITY: None <br '
'/><br />STORY: $$ (The story focuses on Francine Forbes - who wisely changed '
'her name to Forbes Riley after this film was made - who accepts a job '
'teaching at a university. People start to die and Forbes believes the killer '
'is targeting her. Is it her new heartthrob with a checkered past or the '
'libido-crazed student? To be honest, it is impossible to care because the '
"script doesn't flesh out any character outside Forbes).<br /><br />ACTING: $ "
'(Terrible on all levels. This slasher has the feel of a school production '
'-high school that is because college students could make a better flick than '
'this. Forbes showcases a modicum of talent as does Seminara as one of the '
'students, but everyone else is of the "extras" caliber of acting).')
('There were a lot of truly great horror movies produced in the seventies - '
"but this film certainly isn't one of them! It's a shame The Child isn't "
'better as it works from a decent idea that takes in a couple of sometimes '
'successful horror themes. We have the idea of a vengeful child, which worked '
'so well in classic films such as The Bad Seed and then we have the central '
'zombie theme, which of course has been the backbone of many a successful '
'horror movie. The plot is basically this: young girl blames a load of people '
'for the death of her mother, so she goes to the graveyard and raises the '
'dead to get revenge (as you do). This is all well and good, except for the '
"fact that it's boring! Nothing happens for most of the film, and although it "
"does pick up at the end with some nice gore; it's not enough of a finale to "
'justify sitting through the rest of it. The film was obviously shot on a '
"budget as the locations look cheap and all the actors are rubbish. There's "
"really not much I can say about the film overall as there isn't much to it. "
"The Child is a dismal seventies horror flick and I certainly don't recommend "
'it.')
Generation part#
The prompt for the model must include infromation loaded from the RAG system. The next cell defines the generate_system_prompt
function, which incorporetas the retrieved information into the prompt.
system_template = """
You are a movie expert. You are provided with reviews from the IMDb dataset that are relevant to the user's request.
Reviews:
{reviews}
""".strip()
def generate_system_prompt(reviews: list[str]) -> str:
return system_template.format(reviews="\n\n".join(reviews))
The next cell shows the kind of input associated with the retrieved information available to the model.
print(
generate_system_prompt(
load_relevant_reviews("what is the typcal plot for a horror movie?")
)
)
You are a movie expert. You are provided with reviews from the IMDb dataset that are relevant to the user's request.
Reviews:
How can you tell that a horror movie is terrible? when you can't stop laughing about it of course! The plot has been well covered by other reviewers, so I'll just add a few things on the hilarity of it all.<br /><br />Some reviews have placed the location in South America, others in Africa, I thought it was in some random island in the Pacific. Where exactly does this take place, seems to be a mystery. The cannibal tribe is conformed by a couple of black women some black men, and a man who looks like a young Frank Zappa banging the drums... the Devil God is a large black man with a terrible case of pink eyes.<br /><br />One of the "freakiest" moments in the film is when, "Pablito" find his partner hanging from a tree covered in what seems to be an orange substance that I assume is blood, starts screaming for minutes on and on (that's actually funny), and then the head of his partner falls in the ground and "Pablito" kicks it a bit for what I assume is "shits n' giggles" and the eyes actually move...<br /><br />But, of course, then the "freak" is gone when you realize the eyes moved because the movie is just bad...<br /><br />I hadn't laughed like this in a loooong while, and I definitely recommend this film for a Sunday afternoon with your friends and you have nothing to do... grab a case of beers and start watching this film, you'll love it! If you are looking for a real horror or gore movie, though... don't' bother.
This movie still chills me to the bone thinking of it. This movie was not just bad as in low-budget, badly acted, etc. although it certainly WAS all of those things. The problem with this movie is that it seemed to be intentionally trying to annoy the viewer, and doing it with great success. What I want to know is, is this supposed to be a horror movie? I mean, it's definately horrifying, but not in the way horror movies are supposed to be. I could see the first segment trying to be horror and failing, but what the hell is the second segment? It's just annoying. The third segment is like watching an artsy student film, which amazingly enough makes it the least painful segment. It's an atrocity that this movie isn't way low on the bottom 100, so get your votes (1/10) in people!! I know some people gave this good reviews, but, well, they're lying in a sadistic attempt to trick you. Trust me, it is impossible to like this movie. The only benefit of this movie is an amazing life-extending effect: it feels like you've been watching this movie for years after only the first half hour has passed.
Chilling, majestic piece of cinematic fright, this film combines all the great elements of an intellectual thriller, with the grand vision of a director who has the instinctual capacity to pace a moody horror flick within the realm of his filmmaking genius that includes an eye for the original shot, an ice-cold soundtrack and an overall sense of dehumanization. This movie cuts through all the typical horror movies like a red-poker through a human eye, as it allows the viewer to not only feel the violence and psychosis of its protagonist, but appreciate the seed from which the derangement stems. One of the scariest things for people to face is the unknown and this film presents its plotting with just that thought in mind. The setting is perfect, in a desolate winter hideaway. The quietness of the moment is a character in itself, as the fermenting aggressor in Jack Torrance's mind wallows in this idle time, and breeds the devil's new playground. I always felt like the presence of evil was dormant in all of our minds, with only the circumstances of the moment, and the reasons given therein, needed to wake its violent ass and pounce over its unsuspecting victims. This film is a perfect example of this very thought.<br /><br />And it is within this film's subtle touches of the canvas, the clackity-clacks of the young boy's big wheel riding along the empty hallways of the hotel, the labyrinthian garden representing the mind's fine line between sane and insane, Kubrick's purposely transfixed editing inconsistencies, continuity errors and set mis-arrangements, that we discover a world guided by the righteous and tangible, but coaxed away by the powerful and unknown. I have never read the book upon which the film is based, but without that as a comparison point, I am proud to say that this is one of the most terrifying films that I have ever seen. I thought that the runtime of the film could've been cut by a little bit, but then again, I am not one of the most acclaimed directors in the history of film, so maybe I should keep my two-cent criticisms over a superb film, to myself. All in all, this movie captures your attention with its grand form and vision, ropes you in with some terror and eccentric direction, and ties you down and stabs you in the heart with its cold-eyed view of the man's mind gone overboard, creepy atmosphere and the loss of humanity.<br /><br />Rating: 9/10
I am an avid B-Rate horror film buff and have viewed my fair share of slasher pictures, so I have a substantial gauge to judge this film by. It easily ranks in the upper echelon of the worst horror films the 1980's has to offer. It isn't as scary as Night of the Demons, it isn't as gory as Re-Animator and lacks the camp value of There's Nothing Out There. That being said, this film has no value. Keep in mind, the movie artwork is for a completely different film. The stills shots on the back of the DVD box aren't taken from this film.<br /><br />VIOLENCE: $$$ (There is plenty of violence but we've seen it all before. A murderer kills nubile students and the occasional facility member by slitting throats and all the other tired methods of murder that horror films utilize).<br /><br />NUDITY: None <br /><br />STORY: $$ (The story focuses on Francine Forbes - who wisely changed her name to Forbes Riley after this film was made - who accepts a job teaching at a university. People start to die and Forbes believes the killer is targeting her. Is it her new heartthrob with a checkered past or the libido-crazed student? To be honest, it is impossible to care because the script doesn't flesh out any character outside Forbes).<br /><br />ACTING: $ (Terrible on all levels. This slasher has the feel of a school production -high school that is because college students could make a better flick than this. Forbes showcases a modicum of talent as does Seminara as one of the students, but everyone else is of the "extras" caliber of acting).
There were a lot of truly great horror movies produced in the seventies - but this film certainly isn't one of them! It's a shame The Child isn't better as it works from a decent idea that takes in a couple of sometimes successful horror themes. We have the idea of a vengeful child, which worked so well in classic films such as The Bad Seed and then we have the central zombie theme, which of course has been the backbone of many a successful horror movie. The plot is basically this: young girl blames a load of people for the death of her mother, so she goes to the graveyard and raises the dead to get revenge (as you do). This is all well and good, except for the fact that it's boring! Nothing happens for most of the film, and although it does pick up at the end with some nice gore; it's not enough of a finale to justify sitting through the rest of it. The film was obviously shot on a budget as the locations look cheap and all the actors are rubbish. There's really not much I can say about the film overall as there isn't much to it. The Child is a dismal seventies horror flick and I certainly don't recommend it.
The following cell builds the model_interface
function, which allows access the machine learning model.
The ollama
inference server significantly boosts the performance of the generation model. It will be used if the OLLAMA_AVAILABLE
flag is set to True
. Otherwise, the pipeline
provided by the transformers
package will be used.
OLLAMA_AVAILABLE = True
if OLLAMA_AVAILABLE:
def model_interface(messages: list[dict[str, str]]) -> str:
ans = ollama.chat(
model='qwen2:1.5b',
messages=messages
)
if ans.message.content is None:
raise ValueError("No response from the model.")
return ans.message.content
else:
generation_pipeline = pipeline(
"text-generation",
model="Qwen/Qwen2-1.5B-Instruct"
)
def model_interface(messages: list[dict[str, str]]) -> str:
ans = generation_pipeline(
messages, max_new_tokens=512, temperature=0.1, top_p=0.7
)
return ans[0]["generated_text"][-1]["content"]
The following cell wraps the generation procedure around the function that only returns the system’s response.
def generate(request: str) -> str:
system_prompt = generate_system_prompt(
load_relevant_reviews(request)
)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": request}
]
return model_interface(messages)
Next cells shows the possible responses of the system.
pprint(generate("What is the typical plot for a horror movie?"))
('The typical plot for a horror movie often involves a protagonist who becomes '
'suspicious about an event, such as a death in their family or witnessing '
'something disturbing, and then seeks out answers through unusual means. This '
'leads them into situations that often involve dark secrets, supernatural '
'occurrences, or unexpected adversaries. The suspense is built with '
'increasing tension as the audience learns more about the mystery at hand.\n'
'\n'
'The plot may include elements of fear, dread, and psychological horror. '
'Characters facing personal tragedies are often revealed to have been '
'harboring some kind of secret, which leads them down a path of discovery, '
'where they must confront their own fears and confront what has been kept '
'hidden from them. The genre typically uses jump scares, suspenseful moments, '
'and supernatural elements such as ghosts, vengeful spirits, or other '
'entities that haunt people for reasons unknown.\n'
'\n'
'A horror movie often features action sequences involving the protagonist '
'escaping from dangerous situations and fighting off evil forces, as well as '
'scenes of violence and gore. The climax might involve a final confrontation '
'between the protagonist and their nemesis, with potentially satisfying '
'payoffs in terms of character development and resolution.\n'
'\n'
'In summary, the typical plot for a horror movie involves a compelling '
'mystery or intrigue that leads to a confrontational ending, often filled '
'with tension, suspense, action, and violence. This genre is designed to keep '
'viewers on edge throughout and often includes supernatural elements as part '
'of its appeal.')
It’s difficult to understand what role RAG played here, but the description is not so bad.
Next request is a bit more details specific.
pprint(generate("What are the best roles of Robert De Niro?"))
('Based on the reviews provided, some of the best roles that Robert De Niro '
'has played include:\n'
'\n'
'1. Taxi Driver (1976): This role was groundbreaking and marked a career '
'shift for De Niro as he transitioned from a comedy star to a serious drama '
'actor.\n'
'\n'
'2. The King of Comedy (1982): Another iconic film, this role showcased De '
"Niro's ability to embody a range of characters with his acting skills.\n"
'\n'
'3. Cape Fear (1991): De Niro delivered one of the greatest performances in '
'cinematic history, portraying the title character as a man willing to use '
'any means necessary to achieve what he wants.\n'
'\n'
"4. Taxi Driver: The film showcased De Niro's ability to deliver emotional "
'depth and humor through his acting skills.\n'
'\n'
'5. The Untouchables (1987): De Niro won an Oscar for this role, showing off '
'his range as a character actor in both comedy and serious drama roles.\n'
'\n'
"6. New York, New York: Although not the best film, De Niro's performance was "
'praised for his "sublime" portrayal of a troubled city businessman.\n'
'\n'
'7. Casino (1995): De Niro won an Oscar for this role as he portrayed the '
'corrupt mobster Frank "Lefty" Luciano.\n'
'\n'
"8. Taxi (1982): This role highlighted De Niro's comedic timing and ability "
'to deliver emotional depth through his performances.\n'
'\n'
"In summary, these roles showcased De Niro's versatility, range of "
'performance styles, and ability to transition between genres with ease.')
Robert De Niro played a role in most of the suggested titles.