Find movies given a search prompt

Once movies nodes have an embedding encoding their title and plot, and a vector index on those embeddings is present in the database, you can retrieve movies matching a loose description (a search prompt), a bit like you would query a search engine to find relevant web pages based on a few keywords.

The examples in this page show how to retrieve movies that relate to the prompt a criminal is changed through love. You may imagine it akin to going to the movie teather and asking: "What movies would you recommend to me? Today I feel inspired by movies in which a criminal is changed through love." The main difference is that at the theater the conversation takes place in natural language, whereas to search in the database the prompt is first turned into an embedding. The vector index can then use the prompt embedding to retrieve the nodes whose embedding is most similar to the search prompt.

Always use the same model to generate embeddings: pick one and use it to generate embeddings for both the dataset and search prompts. Attempting to mix models with different vector sizes will result in an error. Models with the same vector size will work, but are unlikely to interact well with one another, as they most likely differ in the way they were trained.

Similarity for embeddings generated with open source libraries

This example uses SentenceTransformers to retrieve nodes relating to the description a criminal is changed through love.

from sentence_transformers import SentenceTransformer
import neo4j


URI = '<URI for Neo4j database>'
AUTH = ('<Username>', '<Password>')
DB_NAME = '<Database name>'  # examples: 'recommendations-50', 'neo4j'

driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)
driver.verify_connectivity()

model = SentenceTransformer('all-MiniLM-L6-v2')  (1)

query_prompt = 'a criminal is changed through love'  (2)
query_embedding = model.encode(query_prompt)

related_movies, _, _ = driver.execute_query('''  (3)
    CALL db.index.vector.queryNodes('moviePlots', 5, $queryEmbedding)
    YIELD node, score
    RETURN node.title AS title, node.plot AS plot, score
    ''', queryEmbedding=query_embedding,
    database_=DB_NAME)
print(f'Movies whose plot and title relate to `{query_prompt}`:')
for record in related_movies:
    print(record)

1	It is important to generate the embedding for the search prompt with the same model as the one used to generate the embeddings that you search among. This tutorial used `all-MiniLM-L6-V2` to generate embeddings and it is re-used here as well.
2	The query prompt contains a loose description of movies to retrieve. It is then encoded into an embedding, so that it can be used to query for similar nodes.
3	To query the vector index, use the Cypher procedure `db.index.vector.queryNodes`. The database returns the `5` most similar nodes to `query_embedding` from the vector index `moviePlots`, together with the `score` of how well they match the query embedding.

Movies whose plot and title relate to `a criminal is changed through love`:
<Record title='I Love You Phillip Morris' plot="A cop turns con man once he comes out of the closet. Once imprisoned, he meets the second love of his life, whom he'll stop at nothing to be with." score=0.792834997177124>
<Record title='Laura' plot='A police detective falls in love with the woman whose murder he is investigating.' score=0.7741715908050537>
<Record title='Despicable Me' plot='When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.' score=0.772994875907898>
<Record title='Laws of Attraction' plot='Amidst a sea of litigation, two New York City divorce lawyers find love.' score=0.7727792263031006>
<Record title='Love the Hard Way' plot='The story of a petty thief who meets an innocent young woman and brings her into his world of crime while she teaches him the lessons of enjoying life and being loved.' score=0.7681001424789429>

Similarity for embeddings generated with OpenAI and other cloud services

This example uses OpenAI to retrieve nodes relating to the description a criminal is changed through love.

import neo4j


URI = '<URI for Neo4j database>'
AUTH = ('<Username>', '<Password>')
DB_NAME = '<Database name>'  # examples: 'recommendations-50', 'neo4j'
driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)
driver.verify_connectivity()

openAI_token = '<OpenAI API token>'  (1)

search_prompt = 'a criminal is changed through love'  (2)

search_query = '''
WITH genai.vector.encode($searchPrompt, 'OpenAI', { token: $token }) AS queryVector  (3)
CALL db.index.vector.queryNodes('moviePlots', 5, queryVector)  (4)
YIELD node, score
RETURN node.title as title, node.plot, score
'''
records, summary, _ = driver.execute_query(
    search_query, searchPrompt=search_prompt, token=openAI_token,
    database_=DB_NAME)
print(f'Movies whose plot and title relate to `{search_prompt}`:')
for record in records:
    print(record)

1	Your OpenAI API token, such as `sk-proj-XXXX`.
2	The query prompt contains a loose description of movies to retrieve.
3	The query prompt is encoded into an embedding via the Cypher function `genai.vector.encode()`, so that it can be used to query for similar nodes.
4	To query the vector index, use the Cypher procedure `db.index.vector.queryNodes`. The database returns the `5` most similar nodes to `queryVector` from the vector index `moviePlots`, together with the `score` of how well they match the query embedding.

Movies whose plot and title relate to `a criminal is changed through love`:
<Record title='I Love You Phillip Morris' node.plot="A cop turns con man once he comes out of the closet. Once imprisoned, he meets the second love of his life, whom he'll stop at nothing to be with." score=0.9272396564483643>
<Record title='Love the Hard Way' node.plot='The story of a petty thief who meets an innocent young woman and brings her into his world of crime while she teaches him the lessons of enjoying life and being loved.' score=0.9221653938293457>
<Record title='Laura' node.plot='A police detective falls in love with the woman whose murder he is investigating.' score=0.9215129017829895>
<Record title='Despicable Me' node.plot='When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.' score=0.9206478595733643>
<Record title='Cook the Thief His Wife & Her Lover, The' node.plot='The wife of an abusive criminal finds solace in the arms of a kind regular guest in her husbands restaurant.' score=0.9205931425094604>

Quality of matches

The quality of matches depends entirely on the embedding model and on the dataset, not on the Neo4j vector index. Embeddings are always generated outside of Neo4j; the database only stores them as properties.

Consider the nodes retrieved with the search prompt a criminal is changed through love using SentenceTransformers (OpenAI’s results are similar):

Movies whose plot and title relate to `a criminal is changed through love`:
<Record title='I Love You Phillip Morris' plot="A cop turns con man once he comes out of the closet. Once imprisoned, he meets the second love of his life, whom he'll stop at nothing to be with." score=0.792834997177124>
<Record title='Laura' plot='A police detective falls in love with the woman whose murder he is investigating.' score=0.7741715908050537>
<Record title='Despicable Me' plot='When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.' score=0.772994875907898>
<Record title='Laws of Attraction' plot='Amidst a sea of litigation, two New York City divorce lawyers find love.' score=0.7727792263031006>
<Record title='Love the Hard Way' plot='The story of a petty thief who meets an innocent young woman and brings her into his world of crime while she teaches him the lessons of enjoying life and being loved.' score=0.7681001424789429>

This example shows that embeddings worked according to expectations: Despicable Me comes up third with a relevance score of 77%. At the same time, it shows the limitations of embedding models, as it retrieved movies that don’t really relate to the prompt:

Laura has no "criminal changed through love", but it has a police detective (who often works with criminals) who falls in love in the context of a murder (again related to criminals).
Laws of Attraction has no criminals at all, but it has: attraction, which relates to love; litigation, which usually happens in courts, which are related to criminals; lawyers, who are often associated with criminals; and love, although among lawyers.
Love the Hard Way has it almost the other way around: an innocent student falls in love with a lower version of a criminal (a petty thief), and enters a downward spiral.

Even if these movies hardly relate to the search prompt, the database is right: they are the most relevant ones, according to the embeddings. Why the embeddings don’t encode meaning in they way one may expect is a question that has nothing to do with vector indexes, and everything to do with the external AI models. If your search prompts return poor results, you should investigate the embedding model and the dataset it is applied to, rather than tweaking the Neo4j side of things.