Similarity search in GraphQL

This post shows a simple example of a GraphQL schema with vector embeddings and corresponding mutation and query.

Deploy the following GraphQL schema:

type Project {
  id: ID!
  title: String! @id
  title_v: [Float!]
    @embedding
    @search(by: ["hnsw(metric: euclidean, exponent: 4)"])
}

In this schema, the field title_v is an embedding on which the HNSW algorithm is used to create a vector search index. The metric used to compute the distance between vectors (in this example) is Euclidean distance. A new directive, @embedding, has been introduced to designate one or more fields as vector embeddings. The @search directive has been extended to define the HNSW index based on Euclidean distance. The exponent value is used to set reasonable defaults for HNSW internal tuning parameters. It is an integer representing an approximate number for the vectors expected in the index, in terms of power of 10. Default is “4” (10^4 vectors).

fig1

Once deployed successfully:

fig2

Let’s add some data via the auto-generated addProject mutation type.

mutation {
  addProject(
    input: [
      {
        title: "iCreate with a Mini iPad"
        title_v: [0.12, 0.53, 0.9, 0.11, 0.32]
      }
      {
        title: "Resistive Touchscreen"
        title_v: [0.72, 0.89, 0.54, 0.15, 0.26]
      }
      { title: "Fitness Band", title_v: [0.56, 0.91, 0.93, 0.71, 0.24] }
      { title: "Smart Ring", title_v: [0.38, 0.62, 0.99, 0.44, 0.25] }
    ]
  ) {
    project {
      id
      title
      title_v
    }
  }
}

fig3

The auto-generated querySimilarProjectByEmbedding query allows us to run semantic (aka similarity) search using the vector index specified in our schema.

Execute the query:

query {
  querySimilarProjectByEmbedding(
    by: title_v
    topK: 3
    vector: [0.1, 0.2, 0.3, 0.4, 0.5]
  ) {
    id
    title
    vector_distance
  }
}

fig4

The results obtained for the querySimilarProjectByEmbedding function includes the 3 closest Projects ordered by vector_distance. The vector_distance is the Euclidean distance between the title_v embedding vector and the input vector used in our query.

Note: you can omit vector_distance predicate in the query, the result will still be ordered by vector_distance.

The distance metric used is specified in the index creation. In this example we have used:

title_v: [Float!] @embedding @search(by: ["hnsw(metric: euclidian, exponent: 4)"])

We can also query for similar objects to an existing object, given it’s Id, using the getSimilar<Object>ById function.

query {
  querySimilarProjectById(by: title_v, topK: 3, id: "0xef7") {
    id
    title
    vector_distance
  }
}

fig5

In the example below, we use title to identify a project for which we want to find similar projects. In this case, the title field is an external ID and annotated using the @id directive in the schema. You can have multiple fields designated as external IDs, using the @id directive.

query {
  querySimilarProjectById(by: title_v, topK: 3, title: "Smart Ring") {
    title
    vector_distance
  }
}

fig5

JUNE 5 2024

Using Vector similarity search in GraphQL

Dgraph 24.0.0 supports GraphQL similarity search based on vector indexes for efficient retrieval of similar contents.

Similarity search in GraphQL