JULY 28 2017
Building apps with Dgraph's Go client

We've just released v0.8 and it contains lots of new features and improvements. The Go client saw some nice improvements, so this post will walk you through the client interface and demonstrate some example code.
The GoDoc already contains all the specs and small examples of how to use the client. This post will take you on a guided tour and with examples that are more on the scale of a real app.
There are basically three ways to use the client:
- queries,
- request based mutations, and
- batched mutations.
Of course you can mix those up in a single client session, but to give each its own treatment, I'll deal with each separately here. This post will walk you through three programs available in our github, a few hundred lines each, that demonstrate interactions in those three modes. The examples also show how concurrent goroutines can safely use the client.
dgraphloader uses the client interface to batch updates it reads from gzipped RDF files. It will save you from writing code to read RDF into Dgraph, and it's another good example of how to use the client interface.
Before reading the rest of the post, you might like to watch this introductory video on using the Go client.
We're developing more tutorials and presentations on our YouTube channel
Getting Started
These shouldn't be your first Go programs, so let's assume that you have
$GOPATH
setup and thus you can go get the version 0.8 release branch
go get -u -v github.com/dgraph-io/dgraph/client
cd $GOPATH/src/github.com/dgraph-io/dgraph && git checkout release/v0.8.0
or work with the master branch.
Fundamental types
A graph is about nodes and edges, so no surprises that the two fundamental graph
types in the client are client.Node
and client.Edge
.
The other two main types are client.Dgraph
, which is the connection to the
Dgraph backend, and client.Req
which stores mutations and queries to be sent
to the backend.
Starting the client
The client takes a slice of grpc connections to the Dgraph database. This can be
a single connection, multiple connections to the one server, or connections to
multiple servers if connecting to a cluster. The client will spread your
requests and batches across the given connections. All connections from the
client to Dgraph are to the ports given at --grpc_port
when the Dgraph
instances were started (default 9080).
Start a single connection by dialing the Dgraph backend (check out dgraphloader
setupConnection()
to see how to enable TLS).
conn, err := grpc.Dial(dgraph-grpc-address, grpc.WithInsecure())
if err != nil {
log.Fatal(err)
}
defer conn.Close()
Then put either the single connection or multiple connections into a slice when starting the client.
The client stores maps of blank-node-name -> Node
and
XID ->
Node
, so you can do quick look ups while using the client. The following
places these maps in a temporary directory, to be deleted when the program
exits, but you can also keep the directory to persist the maps across multiple
sessions (check out dgraphloader option --c
to see how you could persist XID
maps across multiple loads of RDF data).
clientDir, err := ioutil.TempDir("", "client_")
if err != nil {
log.Fatal(err)
}
defer os.RemoveAll(clientDir)
For batching, the client builds multiple batches concurrently which it submits
to the backend as the batches fill. Set the batch options with the
BatchMutationOptions
type, or supply the defaults.
dgraphClient := client.NewDgraphClient(connections, client.DefaultOptions, clientDir)
defer dgraphClient.Close()
That starts a client and all interaction with Dgraph goes through the started client.
Queries through the Go client
When we built our tour, we needed a dataset that was complex enough to teach the whole query language, small enough to load quickly and engaging enough that people could relate immediately to the data. The dataset of 21 million edges about movies and actors was about right, but we wanted to take a subset, so loading it didn't break the flow of the tour. Because it's a graph, we couldn't just grab part of the input file, so we crawled it.
Here's a program that does just that.
The 21million data is all about movies, directors and actors. Here's a conceptual view of some of what the dataset contains.
Directors are linked to their films with director.film
. Films have an
initial_release_date
a genre
and are linked to performances by starring
.
Performances tell us about an actor playing a character. Actors are linked to
their roles and thus movies. Actors, directors, movies and genres all have a
name
. I drew the actors and directors overlapping because some people are both
actors and directors. The data doesn't contain typing information for directors
etc. We know, however, that a node represents a director when it has the
director.film
edge, or an actor when it has actor.film
.
Most of the queries I'd thought of for the tour were about directors, so I decided to make the crawl based around directors. For each director the crawl sees, it grabs all their movies and pushes any director the actors in those movies have worked for onto the queue of directors to visit. That way the crawl will finish with each director and movie it's seen completed, but won't complete all movies for every actor it encounters.
Warning
Query syntax and features has changed in v1.0 release. For the latest syntax and features, visit link to docs.
Query
Queries in the client are pretty straight forward. There are two options:
- A stand alone query, which is added to a request
req
withreq.SetQuery(<query-string>)
. And, - A query with embedded variables, which is added to a request
req
withreq.SetQueryWithVariables(<query-string>,<map[string]string>)
. The query will contain variables$a
,$b
, etc and the map will have keys mapping the variables to values. If a query is used multiple times, it's generally easier to just update the map then to manipulate a raw string (the example program has examples of both).
For example
directorsMoviesTemplate = `{
movies(func: uid($a)) {
movie: director.film {
_uid_
EnglishName: name@en
GermanName: name@de
ItalianName: name@it
starring {
performance.actor {
_uid_
name@en
}
performance.character {
_uid_
name@en
}
}
genre {
_uid_
name@en
}
~director.film {
_uid_
name@en
}
initial_release_date
}
}
}`
directorMoviesMap = make(map[string]string)
...
req := client.Req{}
directorMoviesMap["$a"] = <some director UID>
req.SetQueryWithVariables(directorsMoviesTemplate, directorMoviesMap)
sets up a request with a query for all a director's movies. That's then run with
resp, err := dgraphClient.Run(context.Background(), &req)
Note that the query uses name@en
, name@de
and name@it
. Version 0.8
introduced new
language preference
rules, and a query for name
won't work if there is no untagged name ---
name@.
would return a name in some language if name
didn't exist.
If there were mutations in the request too, those would be run first.
The question with a query is what to do with the response resp
. It's got
latency information resp.L
, assigned nodes resp.AssignedUids
(if the query
string contained a mutation with blank nodes), resp.Schema
if there was a
schema query and resp.N
, a Slice of
protos.Node representing the nodes
returned by the query. The response can be printed with
fmt.Printf("Raw Response: %+v\n", proto.MarshalTextString(resp))
And you'll see that each protos.Node
has an attribute
, the edge that lead to
this node, a slice of properties
, the scalar edges out of this node, and a
slice of children
, the edges out to other nodes. Here's a small part of such a
print for director Peter Jackson's movies.
Raw Response: n: <
attribute: "_root_"
children: <
attribute: "movies"
children: <
attribute: "movie"
properties: <
prop: "_uid_"
value: <
uid_val: 1891953090925962368
>
>
properties: <
prop: "EnglishName"
value: <
str_val: "The Hobbit: The Battle of the Five Armies"
>
>
properties: <
prop: "GermanName"
value: <
str_val: "Der Hobbit - Hin und zur\303\274ck"
>
>
properties: <
prop: "ItalianName"
value: <
str_val: "Lo Hobbit - La battaglia delle cinque armate"
>
>
properties: <
prop: "initial_release_date"
value: <
str_val: "2014-12-10T00:00:00Z"
>
>
children: <
attribute: "starring"
children: <
attribute: "performance.actor"
properties: <
prop: "_uid_"
value: <
uid_val: 1834782200806344758
>
>
properties: <
prop: "name@en"
value: <
str_val: "Benedict Cumberbatch"
>
>
>
children: <
attribute: "performance.character"
properties: <
prop: "_uid_"
value: <
uid_val: 151357
>
>
properties: <
prop: "name@en"
value: <
str_val: "The Necromancer"
>
>
>
>
...
...
Unmarshal
You can walk around the response programmatically --- check functions
printNode()
and
visitActor()
for examples of that. But one of the best new features of the client in version
0.8 is client.Unmarshal()
. It works just like json.Unmarshal()
in the
standard libs to unpack directly into a struct.
Here are some structures representing the types in our movie graph.
type movie struct {
ReleaseDate time.Time `dgraph:"initial_release_date"` // Often just use the edge name and a reasonable type.
ID uint64 `dgraph:"_uid_"` // _uid_ is extracted to uint64 just like any other edge.
Name string `dgraph:"EnglishName"` // If there is an alias on the edge, use the alias.
NameDE string `dgraph:"GermanName"`
NameIT string `dgraph:"ItalianName"`
Genre []genre `dgraph:"genre"` // The struct types can be nested. As long as the tags match up, all is well.
Starring []*performance `dgraph:"starring"` // Pointers to structures are fine too - that might save copying structures later.
Director []*director `dgraph:"~director.film"` // reverse edges work just like forward edges.
}
type performance struct {
Actor *actor `dgraph:"performance.actor"`
Character *character `dgraph:"performance.character"`
}
type movieQuery struct {
Root []movie `dgraph:"movie"`
}
Now give Unmarshal a struct with tags matching the edges in the query (note how these tags match the query above) and the bit of a query response you want and, bang, the whole query result in the type that makes sense in your program, nice!
var movs movieQuery
err = client.Unmarshal(resp.N[0].Children, &movs)
So model your data with types that make sense, write queries that extract out the data you need and Dgraph does the rest.
From here the example program uses those data structures to write out the crawled information to a file.
Request-based mutations in the Go client
The previous example shows how to query and unmarshal results in the Go client. It wrote results to a file. That's the sort of interaction you'd need to query an existing store and send the results somewhere. To get the data into Dgraph in the first place the client allows you to build and run mutations.
Instead of running a crawl that's written to a file, how about a crawler that queries from one Dgraph store and builds mutations based on that data that it commits to another Dgraph.
Here's a program that does just that.
The code's got similar ideas to the last example, but it's stepped the interaction up a notch. Firstly, there's two clients: one for the source (only queried) and one for the target (only written to). Secondly, for both those clients, the example allows for multiple grpc connections; for example, if the source and target are clusters. Thirdly, there's multiple connections, so to make use of that it runs concurrent crawlers in goroutines.
Adding edges
Function
visitMovie()
is the interesting one for a discussion about mutations. First it queries data
for a movie from the source and unmarshals the result into the movie struct from
the previous example. From there it builds a mutation for the edges representing
the movie and submits that to the target.
A graph is about nodes and edges between nodes, so that's what we've got to build to make a graph.
First, a new request
req := client.Req{}
then make a node
mnode, err := target.NodeBlank("")
then attach edges to the node. This one adds a scalar edge for the English name.
e = mnode.Edge("name@en")
err = e.SetValueString(m.Name)
if err != nil {
...
}
err = req.Set(e)
if err != nil {
...
}
after Set()
, the edge has been added into the request and it's safe to reuse
e
. So, once the code has a node for a genre gnode
, it then connects the two
nodes with another edge.
e = mnode.ConnectTo("genre", gnode)
err = req.Set(e)
visitMovie()
continues in this fashion adding edges to the request for the
movie name, release date, genres, directors and all the actors and characters.
If it fails at some point, req
is discarded and none of the edges are added to
the store, so we don't get half completed movies in our result. If it
successfully adds all the edges it runs the mutation.
resp, err := target.Run(context.Background(), &req)
And all the edges are committed to the store.
Deleting edges works in the same way. Build the edge, then instead of adding to
the request with req.Set()
, add with req.Delete()
.
Batching updates in the Go client
Just a few weeks back we wrote a series of posts about recommendation engines in Dgraph. Our sample data, in text files, and the Go program that turned that into RDF is here.
Instead of writing RDF and then using dgraphloader to load into Dgraph, we can use the client to write directly to Dgraph. Dgraph isn't really an RDF database. It's a graph database --- a graph is just about nodes and edges (and maybe facets on the edges). dgraphloader is a helper app that loads RDF because there's lots of RDF data around and it's a standard format. In this example we'll skip the intermediate format and go straight from source data to Dgraph.
Here's a program that reads the text input files and submits batched mutations to Dgraph.
Batches
Four goroutines parse the input files. Those goroutines create edges just like in the previous example. But rather than submitting to Dgraph with a request, the edges are added to with
err := dgraphClient.BatchSet(e)
The client gathers the submitted edges into batches, which are submitted when
full. Set up the batching by starting the client with a BatchMutationOptions
struct.
bmOpts := client.BatchMutationOptions{
Size: *numRdf, // number of edges in each batch
Pending: *concurrent, // number of concurrent batches to build
PrintCounters: true, // if you want the client to print running stats
}
The client controls which edges are in which batches and when the batches are
submitted to the Dgraph server. There's no guarantee that sequential calls to
BatchSet()
will put edges in the same batch, nor that the order edges are
submitted will be the same as the order that mutations reach the database. Make
sure you finish with
dgraphClient.BatchFlush()
to flush out all the buffers.
Node maps
That's all pretty standard batch updates. The fun thing here is the client-side, blank-node maps. As the movie file is parsed, each movie needs to be linked with genres from the genres file. As the ratings file is parsed, users from the users file need to be linked to movies from the movies file. We could build a set of data structures to record all this and then read back out of those structures so we ensure that the graph nodes for genres, movies and users link up correctly.
However, reading from a source and matching up nodes like this is such a common pattern in data uploading that we've built the client to take care of it. A call to
node, err := dgraphClient.NodeBlank(<node-identifier>)
reserves a Node
in the graph and keeps track (client side) of which identifier
relates to which Node
. In the input data, users are given ID numbers, movies
have ID numbers and so do genres. Those IDs aren't important after we've loaded
the data. The IDs just need to be used to make the right links between users and
their ratings of movies. We can't just use the numbers to keep track because
there will be a user 10 and a movie 10. Instead, the program asks for blank
nodes with labels like movie10
. For example, when it links a movie to a genre,
it does so with this pattern
m, err := dgraphClient.NodeBlank("movie10")
...
g, err := dgraphClient.NodeBlank("genre3")
...
e = m.ConnectTo("genre", g)
dgraphClient.BatchSet(e)
That will give a genre
edge connecting the Node
for movie 10 to the Node
for genre 3.
It doesn't matter which goroutine gets there first, or what order the edges are
committed. The gorouting parsing the ratings data might read movie 10 before the
goroutine reading the movie data gets there, or it might happen the other way
around. The batch containing the mutation for a user's rating of movie 10 might
hit the database before the edges with the movie's name and genre get there or
even before the user's other data is stored. Doesn't matter. The client
guarantees that it will hook the nodes up correctly because we consistently
called NodeBlank("movie10")
every time we wanted to add an edge involving
movie 10.
It gives the code the freedom to read the data in any order and still link up the nodes correctly without any bookkeeping. So our goroutines don't even need to know about the other goroutines let alone share data or synchronize.
Compare that with the previous example that had to use mutexes to protect shared data between goroutines. In that instance we needed some bookkeeping to control the crawl and record what we'd seen, but often a client program only needs to read data and add nodes so the pattern from this batch example will be simpler.
External IDs
Here we are storing blank nodes, meaning that the names picked on the client side to identify them during loading aren't persisted in the store. If the nodes had identifiers that were important outside of Dgraph, we'd do much the same thing, but with what we call external IDs, or XIDs.
For example, IMDB gives each movie a unique URL. For "Toy Story" it's https://www.imdb.com/title/tt0114709/. If you are from the RDF or linked data communities, you might recognize that as a URI. If our input data used that, or if we needed such external keys for movies, genres and actors (e.g. Tom Hanks gets https://www.imdb.com/name/nm0000158, while genre comedy gets https://www.imdb.com/genre/comedy). Then we could load with
m, err := c.NodeXid("https://www.imdb.com/title/tt0114709/", true)
and the client would give a consistent map to the right node every time the XID
for "Toy Story" was used, and, with the true
flag, the client persists an edge
(xid
) in the store linking the node to its XID.
Now you write some code
Well, now you've got the docs, the GoDocs with examples for everything in the interface, and three larger examples showing how to use the go client in real programs.
Time to load up your favorite editor and start writing your Dgraph app.