Compute embeddings similarity
Once embeddings are in the database, you can compute the similarity of two of them using the Cypher® function vector.similarity.cosine()
.
Example 1. Compare embeddings for two related movies
Query
MATCH (a:Movie {title: "Despicable Me"})
MATCH (b:Movie {title: "Despicable Me 2"})
RETURN vector.similarity.cosine(a.embedding, b.embedding)
vector.similarity.cosine(a.embedding, b.embedding) |
---|
|
Example 2. Compare embeddings for two unrelated movies
Query
MATCH (a:Movie {title: "Despicable Me"})
MATCH (b:Movie {title: "Emperor's New Groove, The"})
RETURN vector.similarity.cosine(a.embedding, b.embedding)
vector.similarity.cosine(a.embedding, b.embedding) |
---|
|
The similarity value of two nodes is not very relevant for practical purposes.
You are normally interested in retrieving the most relevant node(s) given some criteria, so what matters most is that the desired node scores highest relative to the others.
For example, it doesn’t matter what similarity score Despicable Me and Despicable Me 2 have; what matters is that it is the highest among all the movies nodes.
|
This method works fine for comparing two movies, but it doesn’t work to find the most similar nodes to Despicable Me
.
To be able to quickly retrieve nodes basing on their embeddings, go on to create a vector index on the movie embeddings and query the database based on that.