Data federation is the creation of a virtual database that aggregates data fromdistributed sources, giving them a common data model.It is an approach to data integration that provides a single source of data forfront-end applications.It also gives backend developers flexibility in design and service isolation.
To get the most out of GraphQL, your organization should expose a single datagraph that provides a unified interface for querying any combination of yourbacking data sources.However, it can be challenging to represent an enterprise-scale data graph witha single, monolithic GraphQL server.
To remedy this, you can divide your graph's implementation across multiplecomposable services with Apollo Federation.Unlike other distributed GraphQL architectures (such as schema stitching),Apollo Federation uses a declarative programming model that enables each subgraphto implement only the part of your composed supergraph that it's responsible for.
An Apollo Federation architecture consists of:
- A collection of subgraphs (usually represented by different back-end services)that each define a distinct GraphQL schema
- A gateway that composes the subgraphs into a federated data graph and executesqueries across multiple subgraphs
- 1
To achieve data federation, schemas need to be created and annotated to indicate howownership is distributed.Let’s look at an example with three core entities:
Book: In a library, books are stored.The Book type uses a title and ISBN to uniquely identify each Book object, whilealso storing author information.
Reader: Readers read books and write reviews.Each Reader is uniquely identified by a name and user_id, while also storingbirthdate, email addresses, street addresses, and reviews that a read has written.Each review consists of the book title, comment, rating, and review date.
Order: When a reader checks out books from the library, a record of thecheckout_id, reader, and the books checked out are stored, uniquely identifiedby the checkout_id.
These three domains could be owned by three separate engineering teams responsible fortheir own data sources, business logic, and corresponding microservices.In an unfederated implementation, we would have to have this simple schema andthe associated resolvers owned and implemented by a single team.
Stargate GraphQL schema
In this example, Book and Reader schema are supplied from a Stargate instanceas the first data source, and the schema is described in theGraphQL schema-first section.Stargate also automatically adds the following:
Javascript Apollo server schema
For Order , a simple Javascript, orders.js will be used to define the Order schema, supply some resolvers, insert some data, and start an Apollo server asthe second data source. Let's break down the script into four parts.
First, set up the script to require apollo server and apollo federation .Set the port for the server to 4001 .
Next, we need to load the schema for this server.The schema consists of the object type Order , extensions of the object types Book and Reader gathered from the Stargate instance, queries and mutationsrequired to insert data and make queries.
In order to fetch field values from the objects that belong to another service,an extension of that object must be included in the Apollo server.Two directives are used in this example, `@key and @external that pertain todata federation.The @key directive defines a combination of fields that uniquely identify andare used to fetch an object or interface.Only the primary key fields can be used in @key .The @external` directive marks a field as owned by another service.
In Stargate schema, you can use just `@key without specifying the fields`,because the primary key is inferred.
In this example, for the Book object type, the fields title and isbn aredefined in the `@key` directive as the fields that are required to uniquely identifya specific book (the primary key) and fetch the requested information from the object.The `@external` directive further marks the same fields are owned by the Stargateservice, not the Apollo server.The Reader object type has a similar extension defined.
Now let's define the resolvers that this script will use.A resolver is a function that's responsible for populating the data for a singlefield in your schema.Stargate resolves objects based on the schema supplied, but the Javascript serverrequires resolver definition.
Finally, a server is started, building federated schema, listening on port 4001.Data is provided on the orders that are inserted into the service.
The full script:
Let's look at a simple Javascript that runs the Apollo gateway:Let's break this script down as well, into three parts.
In Gateway1, an Apollo server and Apollo gateway are defined as required.In order for the gateway to access the Stargate data source, a token is required.That token is specified as stargateIntrospectionToken .
In Gateway2, a serviceList that defines the data sources lists the Stargateinstance and the Javascript Apollo server that are described above.It builds the services based on logic tied to the name of the data source, library .Note that if more data sources are specified, you'll need to change the logicthat specifies which service is used for queries and mutations.
An experimental flag is set to allow GraphQL Playground to display the queryplan for the federated service.
In Gateway3, a new Apollo server, designated as a gateway is started.There are some settings currently set to false; the code gives an explanation.If the fetched data is from the Stargate instance, the token must be passedin the header of the request as x-cassandra-token .
The full script:
Federated queries
Simple example
Now that we have two subgraphs supplied from two different data sources,and a gateway running, we can explore how data from all data sources can bereturned in a query.
For example, if I want to discover all the books that were checked out at thesame time, and get all the book data, I can use this query in GraphQL Playground,pointed to the URL where the gateway is running, or localhost:4000 in this case:
While the resulting return seems unremarkable, think about the result for a moment.The Apollo server returning an order did supply the reader name and user_id , butthe email and address information is fetched from the Stargate instance.Likewise, the Stargate instance is also supplying the author information forthe books returned in the order query.Thus, the federated supergraph of Order, Book, and Reader, along with theresolvers in the Apollo server, are fetching data from two different servers!That is a very valuable feature, especially for application developers who needto fetch and use data from data sources that they do not control.