Dgraph with Native Vector Support - The Best of Both Worlds

With Dgraph's native vector support, you can not only find the needle in the haystack but also understand its context, its value, and its connections—all within a unified platform!

Introduction

Dgraph integrates vector support directly into its graph database, enabling a seamless combination of structured, interconnected data with the semantic power of vector embeddings. This innovation allows you to build intelligent and adaptive search solutions without relying on external vector databases.

The Goal

Fuzzy matching and semantic search are essential for modern applications, but traditional databases often struggle to combine flexibility with structured relationships. Dgraph’s unique architecture makes this challenge easy to address.

Dgraph models your data as a network of interconnected information — a graph. With native vector support, it enables embedding-based operations directly on the graph. Vectors, which represent the semantic meaning of text or data, allow you to:

Perform similarity searches.
Create intelligent associations.
Enhance search and discovery.

This new capability transforms your knowledge graph into an Association Graph, combining structured knowledge with inferred, context-rich relationships.

Sample Use Case

Previously, we explored how to use word embeddings in Dgraph to implement automatic classification (see blog post). In this use case, projects are associated to categories using word embeddings.

We will leverage the vector embeddings associated with projects to expose a natural language search API. Dgraph’s native vector indexing enables you to store embeddings, compute distances between them, and retrieve context-rich associations — all natively within Dgraph.

Native Vector Support in Dgraph

Dgraph’s vector functionality eliminates the need for external vector databases. Here’s how it works:

Storing Vectors: Nodes in Dgraph can directly store vectors, which are numerical representations of text or other data.
Similarity Search: Perform semantic similarity searches using vector distance computations (e.g., cosine similarity or dot product).
Integrated Metadata: Dgraph can store additional metadata, such as similarity scores or the models used for embedding, ensuring full data lineage.
Unified API: Query structured and semantic data together via DQL or GraphQL, simplifying development and integration.

This unified approach lets you mix reliable, curated information (your knowledge graph) with AI-inferred relationships (associations) without relying on external services.

Facts and inferred knowledge

Hands-On with Dgraph

Data Model

Here’s an example of a GraphQL schema that incorporates Dgraph’s vector embedding capabilities:

type Project {
  id: ID!
  title: String! @search(by: [term])
  grade: String @search(by: [hash])
  category: Category
  score: Float
  embedding: [Float!] @embedding @search(by: ["hnsw"])
}

type Category {
  id: ID!
  name: String!
  embedding: [Float!] @embedding @search(by: ["hnsw"])
}

The embedding field stores vector embeddings for projects and categories.

Embedding Generation

You can use any model to compute embeddings. The sample code is using Hugging Face sentence-transformers/all-MiniLM-L6-v2. You can easily use OpenAI text-embedding-3-small or any other embedding model.

Compute the embedding for a project title or category name.
Save the embedding as a vector field in the respective node.

Semantic Search Logic

Dgraph’s native similarity search greatly simply the semantic search logic:

Compute an embedding for the search text.
Use GraphQL or Dgraph Query Language to find similar vectors in the graph.

MODUS

The easiest way to add custom logic for embedding-based operations is to front Dgraph with Modus. Modus is an open-source, serverless framework designed for building intelligent APIs and functions. Here is an example of API created in Modus:

import { JSON } from "json-as"
import { embedText } from "./embeddings";
import { dgraph } from "@hypermode/modus-sdk-as";
import {searchBySimilarity} from "./dgraph-utils"

const DGRAPH_CONNECTION = "dgraph";

@json
class Project {
  @alias("dgraph.type")
  type: string | null = "Project";
  @alias("uid") @omitnull()
  id: string | null = null;
  @alias("Project.title")
  title!: string;
  @alias("Project.category") @omitnull()
  category: Category | null = null
  @alias("Project.embedding") @omitnull()
  embedding: string | null = null
}

@json
class Category {
  @alias("dgraph.type")
  type: string | null = "Category";
  @alias("uid") @omitnull()
  id: string | null = null;
  @alias("Category.name")
  name!: string;
  @alias("Category.embedding") @omitnull()
  embedding: f32[] | null = null
}
export function addProject( input: Project[]): Map<string, string>|null {
  const uids = new Map<string, string>();
  // add dgraph.type and embedding to each project
  for (let i=0; i < input.length; i++) {
    const project = input[i];
    project.type = 'Project';
    project.embedding = JSON.stringify(embedText([project.title])[0]);
  }
  const payload = JSON.stringify(input);
  const mutations: dgraph.Mutation[] = [new dgraph.Mutation(payload)];
  const res = dgraph.execute(DGRAPH_CONNECTION, new dgraph.Request(null, mutations));

  return res.Uids;
}

The addProject function

Computes an embedding for each Product to create
Performs a mutation in Dgraph to store the Product data with the embedding.

Adding Data

Use the following mutation to add projects:

mutation AddProject($input: [ProjectInput!]!) {
  addProject(input: $input) {
    key
    value
  }
}

with the variables

{
  "input": [
    { "title": "Multi-Use Chairs for Music Classes" },
    { "title": "Photography and Memories....Yearbook in the Works" },
    { "title": "Current Events in Second Grade" },
    { "title": "Great Green Garden Gables" },
    { "title": "Albert.io Prepares South LA students for AP Success!" },
    { "title": "Learning and Growing Through Collaborative Play in TK!" },
    { "title": "Sit Together, Learn Together, Grow Together!" },
    { "title": "Help Special Children Succeed with Social Skills!" },
    { "title": "iCreate with a Mini iPad" },
    { "title": "Photography and Memories....Yearbook in the Works" },
    { "title": "The Truth About Junk Food" },
    { "title": "I Can Listen" },
    { "title": "Making Math A Group Learning Experience" },
    { "title": "The Center Of Learning: Kindergarten Fun!" }
  ]
}

Semantic Search

Semantic search is added by a simple function :

/**
 * Search projects by similarity to a given text
 */
export function searchProjects(search: string): Project[]{
  const embedding = embedText([search])[0];
  const topK = 3;
  const body = `
    uid
    Project.title
    Project.category {
      Category.name
    }
  `
  return searchBySimilarity<Project>(DGRAPH_CONNECTION,embedding,"Project.embedding",body, topK, 0.5);
}

The searchProjects function

Compute an embedding of the input string
Performs a similarity search in Dgraph using the Project.embedding vector predicate.

The search function is simply using a Dgraph query with the similar_to function:

export function searchBySimilarity<T>(connection:string, embedding: f32[],predicate:string, body:string, topK: i32 = 10, threshold:f32 = 0.75): T[]{

    const query = `
    query search($vector: float32vector) {
        var(func: similar_to(${predicate},${topK},$vector))  {
            vemb as ${predicate}
            dist as math((vemb - $vector) dot (vemb - $vector))
            score as math(1 - (dist / 2.0))
        }

        list(func:uid(score),orderdesc:val(score))  @filter(gt(val(score),${threshold})){
            ${body}
        }
    }`
    const vars = new dgraph.Variables();
    vars.set("$vector", JSON.stringify(embedding));

    const dgraph_query = new dgraph.Query(query,vars);

    const response = dgraph.execute(connection, new dgraph.Request(dgraph_query));
    console.log(response.Json)
    return JSON.parse<ListOf<T>>(response.Json).list
  }

Perform semantic searches with the semSearchProjects query:

query SearchProjects {
  searchProjects(search: "Photography and Memories....Yearbook in the Works") {
    id
    title
  }
}

The result will include the most relevant projects based on semantic similarity.

Conclusion

Dgraph’s native vector support brings the power of semantic search and associations directly into your graph database. By combining structured data with embedding-based intelligence, you can build sophisticated applications without relying on external tools.

This unified approach simplifies architecture, reduces latency, and enhances the capabilities of your knowledge graph.

Start exploring Modus and Dgraph’s vector capabilities today by signing up for Dgraph Cloud!

AUGUST 25 2023