Compute embeddings similarity

Once embeddings are in the database, you can compute the similarity of two of them using the Cypher^® function vector.similarity.cosine().

Example 1. Compare embeddings for two related movies

Query

MATCH (a:Movie {title: "Despicable Me"})
MATCH (b:Movie {title: "Despicable Me 2"})
RETURN vector.similarity.cosine(a.embedding, b.embedding)

Table 1. Result with SentenceTransformer embeddings
vector.similarity.cosine(a.embedding, b.embedding)
`0.7020013332366943`

Example 2. Compare embeddings for two unrelated movies

Query

MATCH (a:Movie {title: "Despicable Me"})
MATCH (b:Movie {title: "Emperor's New Groove, The"})
RETURN vector.similarity.cosine(a.embedding, b.embedding)

Table 2. Result with SentenceTransformer embeddings
vector.similarity.cosine(a.embedding, b.embedding)
`0.6120055317878723`

The similarity value of two nodes is not very relevant for practical purposes. You are normally interested in retrieving the most relevant node(s) given some criteria, so what matters most is that the desired node scores highest relative to the others. For example, it doesn’t matter what similarity score Despicable Me and Despicable Me 2 have; what matters is that it is the highest among all the movies nodes.

This method works fine for comparing two movies, but it doesn’t work to find the most similar nodes to Despicable Me. To be able to quickly retrieve nodes basing on their embeddings, go on to create a vector index on the movie embeddings and query the database based on that.