Building apps with Dgraph's Go client

We've just released v0.8 and it contains lots of new features and improvements. The Go client saw some nice improvements, so this post will walk you through the client interface and demonstrate some example code.

The GoDoc already contains all the specs and small examples of how to use the client. This post will take you on a guided tour and with examples that are more on the scale of a real app.

There are basically three ways to use the client:

queries,
request based mutations, and
batched mutations.

Of course you can mix those up in a single client session, but to give each its own treatment, I'll deal with each separately here. This post will walk you through three programs available in our github, a few hundred lines each, that demonstrate interactions in those three modes. The examples also show how concurrent goroutines can safely use the client.

dgraphloader uses the client interface to batch updates it reads from gzipped RDF files. It will save you from writing code to read RDF into Dgraph, and it's another good example of how to use the client interface.

Before reading the rest of the post, you might like to watch this introductory video on using the Go client.

We're developing more tutorials and presentations on our YouTube channel

Getting Started

These shouldn't be your first Go programs, so let's assume that you have $GOPATH setup and thus you can go get the version 0.8 release branch

go get -u -v github.com/dgraph-io/dgraph/client

cd $GOPATH/src/github.com/dgraph-io/dgraph && git checkout release/v0.8.0

or work with the master branch.

Fundamental types

A graph is about nodes and edges, so no surprises that the two fundamental graph types in the client are client.Node and client.Edge.

The other two main types are client.Dgraph, which is the connection to the Dgraph backend, and client.Req which stores mutations and queries to be sent to the backend.

Starting the client

The client takes a slice of grpc connections to the Dgraph database. This can be a single connection, multiple connections to the one server, or connections to multiple servers if connecting to a cluster. The client will spread your requests and batches across the given connections. All connections from the client to Dgraph are to the ports given at --grpc_port when the Dgraph instances were started (default 9080).

Start a single connection by dialing the Dgraph backend (check out dgraphloader setupConnection() to see how to enable TLS).

conn, err := grpc.Dial(dgraph-grpc-address, grpc.WithInsecure())
if err != nil {
    log.Fatal(err)
}
defer conn.Close()

Then put either the single connection or multiple connections into a slice when starting the client.

The client stores maps of blank-node-name -> Node and XID -> Node, so you can do quick look ups while using the client. The following places these maps in a temporary directory, to be deleted when the program exits, but you can also keep the directory to persist the maps across multiple sessions (check out dgraphloader option --c to see how you could persist XID maps across multiple loads of RDF data).

clientDir, err := ioutil.TempDir("", "client_")
if err != nil {
    log.Fatal(err)
}
defer os.RemoveAll(clientDir)

For batching, the client builds multiple batches concurrently which it submits to the backend as the batches fill. Set the batch options with the BatchMutationOptions type, or supply the defaults.

dgraphClient := client.NewDgraphClient(connections, client.DefaultOptions, clientDir)
defer dgraphClient.Close()

That starts a client and all interaction with Dgraph goes through the started client.

Queries through the Go client

When we built our tour, we needed a dataset that was complex enough to teach the whole query language, small enough to load quickly and engaging enough that people could relate immediately to the data. The dataset of 21 million edges about movies and actors was about right, but we wanted to take a subset, so loading it didn't break the flow of the tour. Because it's a graph, we couldn't just grab part of the input file, so we crawled it.

Here's a program that does just that.

The 21million data is all about movies, directors and actors. Here's a conceptual view of some of what the dataset contains.

Movies Schema

Directors are linked to their films with director.film. Films have an initial_release_date a genre and are linked to performances by starring. Performances tell us about an actor playing a character. Actors are linked to their roles and thus movies. Actors, directors, movies and genres all have a name. I drew the actors and directors overlapping because some people are both actors and directors. The data doesn't contain typing information for directors etc. We know, however, that a node represents a director when it has the director.film edge, or an actor when it has actor.film.

Most of the queries I'd thought of for the tour were about directors, so I decided to make the crawl based around directors. For each director the crawl sees, it grabs all their movies and pushes any director the actors in those movies have worked for onto the queue of directors to visit. That way the crawl will finish with each director and movie it's seen completed, but won't complete all movies for every actor it encounters.

Warning

Query syntax and features has changed in v1.0 release. For the latest syntax and features, visit link to docs.

Query

Queries in the client are pretty straight forward. There are two options:

A stand alone query, which is added to a request req with req.SetQuery(<query-string>). And,
A query with embedded variables, which is added to a request req with req.SetQueryWithVariables(<query-string>,<map[string]string>). The query will contain variables $a, $b, etc and the map will have keys mapping the variables to values. If a query is used multiple times, it's generally easier to just update the map then to manipulate a raw string (the example program has examples of both).

For example

directorsMoviesTemplate = `{
  movies(func: uid($a)) {
    movie: director.film {
      _uid_
      EnglishName: name@en
      GermanName: name@de
      ItalianName: name@it
      starring {
        performance.actor {
          _uid_
          name@en
        }
        performance.character {
          _uid_
          name@en
        }
      }
      genre {
        _uid_
        name@en
      }
      ~director.film {
        _uid_
        name@en
      }
      initial_release_date
    }
  }
}`
directorMoviesMap = make(map[string]string)

...

req := client.Req{}
directorMoviesMap["$a"] = <some director UID>

req.SetQueryWithVariables(directorsMoviesTemplate, directorMoviesMap)

sets up a request with a query for all a director's movies. That's then run with

resp, err := dgraphClient.Run(context.Background(), &req)

Note that the query uses name@en, name@de and name@it. Version 0.8 introduced new language preference rules, and a query for name won't work if there is no untagged name --- name@. would return a name in some language if name didn't exist.

If there were mutations in the request too, those would be run first.

The question with a query is what to do with the response resp. It's got latency information resp.L, assigned nodes resp.AssignedUids (if the query string contained a mutation with blank nodes), resp.Schema if there was a schema query and resp.N, a Slice of protos.Node representing the nodes returned by the query. The response can be printed with

fmt.Printf("Raw Response: %+v\n", proto.MarshalTextString(resp))

And you'll see that each protos.Node has an attribute, the edge that lead to this node, a slice of properties, the scalar edges out of this node, and a slice of children, the edges out to other nodes. Here's a small part of such a print for director Peter Jackson's movies.

Raw Response: n: <
  attribute: "_root_"
  children: <
    attribute: "movies"
    children: <
      attribute: "movie"
      properties: <
        prop: "_uid_"
        value: <
          uid_val: 1891953090925962368
        >
      >
      properties: <
        prop: "EnglishName"
        value: <
          str_val: "The Hobbit: The Battle of the Five Armies"
        >
      >
      properties: <
        prop: "GermanName"
        value: <
          str_val: "Der Hobbit - Hin und zur\303\274ck"
        >
      >
      properties: <
        prop: "ItalianName"
        value: <
          str_val: "Lo Hobbit - La battaglia delle cinque armate"
        >
      >
      properties: <
        prop: "initial_release_date"
        value: <
          str_val: "2014-12-10T00:00:00Z"
        >
      >
      children: <
        attribute: "starring"
        children: <
          attribute: "performance.actor"
          properties: <
            prop: "_uid_"
            value: <
              uid_val: 1834782200806344758
            >
          >
          properties: <
            prop: "name@en"
            value: <
              str_val: "Benedict Cumberbatch"
            >
          >
        >
        children: <
          attribute: "performance.character"
          properties: <
            prop: "_uid_"
            value: <
              uid_val: 151357
            >
          >
          properties: <
            prop: "name@en"
            value: <
              str_val: "The Necromancer"
            >
          >
        >
      >
...
...

Unmarshal

You can walk around the response programmatically --- check functions printNode() and visitActor() for examples of that. But one of the best new features of the client in version 0.8 is client.Unmarshal(). It works just like json.Unmarshal() in the standard libs to unpack directly into a struct.

Here are some structures representing the types in our movie graph.

type movie struct {
    ReleaseDate time.Time      `dgraph:"initial_release_date"` // Often just use the edge name and a reasonable type.
    ID          uint64         `dgraph:"_uid_"`                // _uid_ is extracted to uint64 just like any other edge.
    Name        string         `dgraph:"EnglishName"`          // If there is an alias on the edge, use the alias.
    NameDE      string         `dgraph:"GermanName"`
    NameIT      string         `dgraph:"ItalianName"`
    Genre       []genre        `dgraph:"genre"`          // The struct types can be nested.  As long as the tags match up, all is well.
    Starring    []*performance `dgraph:"starring"`       // Pointers to structures are fine too - that might save copying structures later.
    Director    []*director    `dgraph:"~director.film"` // reverse edges work just like forward edges.
}

type performance struct {
    Actor     *actor     `dgraph:"performance.actor"`
    Character *character `dgraph:"performance.character"`
}

type movieQuery struct {
    Root []movie `dgraph:"movie"`
}

Now give Unmarshal a struct with tags matching the edges in the query (note how these tags match the query above) and the bit of a query response you want and, bang, the whole query result in the type that makes sense in your program, nice!

var movs movieQuery
err = client.Unmarshal(resp.N[0].Children, &movs)

So model your data with types that make sense, write queries that extract out the data you need and Dgraph does the rest.

From here the example program uses those data structures to write out the crawled information to a file.

Request-based mutations in the Go client

The previous example shows how to query and unmarshal results in the Go client. It wrote results to a file. That's the sort of interaction you'd need to query an existing store and send the results somewhere. To get the data into Dgraph in the first place the client allows you to build and run mutations.

Instead of running a crawl that's written to a file, how about a crawler that queries from one Dgraph store and builds mutations based on that data that it commits to another Dgraph.

Here's a program that does just that.

The code's got similar ideas to the last example, but it's stepped the interaction up a notch. Firstly, there's two clients: one for the source (only queried) and one for the target (only written to). Secondly, for both those clients, the example allows for multiple grpc connections; for example, if the source and target are clusters. Thirdly, there's multiple connections, so to make use of that it runs concurrent crawlers in goroutines.

Adding edges

Function visitMovie() is the interesting one for a discussion about mutations. First it queries data for a movie from the source and unmarshals the result into the movie struct from the previous example. From there it builds a mutation for the edges representing the movie and submits that to the target.

A graph is about nodes and edges between nodes, so that's what we've got to build to make a graph.

First, a new request

req := client.Req{}

then make a node

mnode, err := target.NodeBlank("")

then attach edges to the node. This one adds a scalar edge for the English name.

e = mnode.Edge("name@en")
err = e.SetValueString(m.Name)
if err != nil {
  ...
}
err = req.Set(e)
if err != nil {
  ...
}

after Set(), the edge has been added into the request and it's safe to reuse e. So, once the code has a node for a genre gnode, it then connects the two nodes with another edge.

e = mnode.ConnectTo("genre", gnode)
err = req.Set(e)

visitMovie() continues in this fashion adding edges to the request for the movie name, release date, genres, directors and all the actors and characters. If it fails at some point, req is discarded and none of the edges are added to the store, so we don't get half completed movies in our result. If it successfully adds all the edges it runs the mutation.

resp, err := target.Run(context.Background(), &req)

And all the edges are committed to the store.

Deleting edges works in the same way. Build the edge, then instead of adding to the request with req.Set(), add with req.Delete().

Batching updates in the Go client

Just a few weeks back we wrote a series of posts about recommendation engines in Dgraph. Our sample data, in text files, and the Go program that turned that into RDF is here.

Instead of writing RDF and then using dgraphloader to load into Dgraph, we can use the client to write directly to Dgraph. Dgraph isn't really an RDF database. It's a graph database --- a graph is just about nodes and edges (and maybe facets on the edges). dgraphloader is a helper app that loads RDF because there's lots of RDF data around and it's a standard format. In this example we'll skip the intermediate format and go straight from source data to Dgraph.

Here's a program that reads the text input files and submits batched mutations to Dgraph.

Batches

Four goroutines parse the input files. Those goroutines create edges just like in the previous example. But rather than submitting to Dgraph with a request, the edges are added to with

err := dgraphClient.BatchSet(e)

The client gathers the submitted edges into batches, which are submitted when full. Set up the batching by starting the client with a BatchMutationOptions struct.

bmOpts := client.BatchMutationOptions{
  Size:          *numRdf,       // number of edges in each batch
  Pending:       *concurrent,   // number of concurrent batches to build
  PrintCounters: true,          // if you want the client to print running stats
}

The client controls which edges are in which batches and when the batches are submitted to the Dgraph server. There's no guarantee that sequential calls to BatchSet() will put edges in the same batch, nor that the order edges are submitted will be the same as the order that mutations reach the database. Make sure you finish with

dgraphClient.BatchFlush()

to flush out all the buffers.

Node maps

That's all pretty standard batch updates. The fun thing here is the client-side, blank-node maps. As the movie file is parsed, each movie needs to be linked with genres from the genres file. As the ratings file is parsed, users from the users file need to be linked to movies from the movies file. We could build a set of data structures to record all this and then read back out of those structures so we ensure that the graph nodes for genres, movies and users link up correctly.

However, reading from a source and matching up nodes like this is such a common pattern in data uploading that we've built the client to take care of it. A call to

node, err := dgraphClient.NodeBlank(<node-identifier>)

reserves a Node in the graph and keeps track (client side) of which identifier relates to which Node. In the input data, users are given ID numbers, movies have ID numbers and so do genres. Those IDs aren't important after we've loaded the data. The IDs just need to be used to make the right links between users and their ratings of movies. We can't just use the numbers to keep track because there will be a user 10 and a movie 10. Instead, the program asks for blank nodes with labels like movie10. For example, when it links a movie to a genre, it does so with this pattern

m, err := dgraphClient.NodeBlank("movie10")
...
g, err := dgraphClient.NodeBlank("genre3")
...
e = m.ConnectTo("genre", g)
dgraphClient.BatchSet(e)

That will give a genre edge connecting the Node for movie 10 to the Node for genre 3.

It doesn't matter which goroutine gets there first, or what order the edges are committed. The gorouting parsing the ratings data might read movie 10 before the goroutine reading the movie data gets there, or it might happen the other way around. The batch containing the mutation for a user's rating of movie 10 might hit the database before the edges with the movie's name and genre get there or even before the user's other data is stored. Doesn't matter. The client guarantees that it will hook the nodes up correctly because we consistently called NodeBlank("movie10") every time we wanted to add an edge involving movie 10.

It gives the code the freedom to read the data in any order and still link up the nodes correctly without any bookkeeping. So our goroutines don't even need to know about the other goroutines let alone share data or synchronize.

Compare that with the previous example that had to use mutexes to protect shared data between goroutines. In that instance we needed some bookkeeping to control the crawl and record what we'd seen, but often a client program only needs to read data and add nodes so the pattern from this batch example will be simpler.

External IDs

Here we are storing blank nodes, meaning that the names picked on the client side to identify them during loading aren't persisted in the store. If the nodes had identifiers that were important outside of Dgraph, we'd do much the same thing, but with what we call external IDs, or XIDs.

For example, IMDB gives each movie a unique URL. For "Toy Story" it's https://www.imdb.com/title/tt0114709/. If you are from the RDF or linked data communities, you might recognize that as a URI. If our input data used that, or if we needed such external keys for movies, genres and actors (e.g. Tom Hanks gets https://www.imdb.com/name/nm0000158, while genre comedy gets https://www.imdb.com/genre/comedy). Then we could load with

m, err := c.NodeXid("https://www.imdb.com/title/tt0114709/", true)

and the client would give a consistent map to the right node every time the XID for "Toy Story" was used, and, with the true flag, the client persists an edge (xid) in the store linking the node to its XID.

Now you write some code

Well, now you've got the docs, the GoDocs with examples for everything in the interface, and three larger examples showing how to use the go client in real programs.

Time to load up your favorite editor and start writing your Dgraph app.

Need more help? Ask us on Discuss or Slack.
Not using Go? Not a problem! Your app will access Dgraph at the HTTP query endpoint and parse the resulting JSON. Watch out for upcoming examples.