How to design a kick-ass GraphQL schema

February 7, 2020 · 10 min read

Adam Hannigan

Engineering Team Lead

This article will provide some practical tips that will help you design an intuitive, scalable and powerful GraphQL schema.

What is a GraphQL schema?

A schema is a structural representation of a product domain. It describes the key concepts of your product, the relations between these concepts and the core actions your system supports.

GraphQL is simply a tool that lets us interact with the schemas we define in an easy to use syntax. It does not enforce any standards around where the data comes from or the way we define our schema.

Because there are no enforced guidelines, good schema design is often forgotten about until it is too late. This leads to schemas that are hard to understand, difficult to maintain and near impossible to scale as new features are introduced.

Product Driven

A significant advantage of GraphQL is that it lets you create an API that is intuitive to engineers and product teams. A GraphQL schema should reveal the items, fields and actions that your end-users will interact with.

Databases are structured and designed in a highly technical and performant manner. GraphQL lets us simplify these structures into items and actions that more closely reflect the nature of our product.

‘First of all we have to be experts at our domain ... Second of all we have to be good at GraphQL’ —GraphQL Schema Design @ Scale (Marc-André Giroux)

Shopify has standardised their schema design in a friendly readme. The main takeaway is that when designing a schema, the API does not need to directly model the user interface, “the implementation and the UI can both be used for inspiration and input into your API design, but the final driver of your decisions must always be the business domain.”

Why?

Easier to understand a product than a complicated data architecture
Helps promote a common product language that engineers, designers and all stakeholders can use to discuss and iterate on complex concepts.

Example

Imagine we are creating a system that lets employees submit leave requests in their company. The main items in our product would be the Employees, the Admins and the Leave Requests. The main actions would be requesting and approving a leave request.

Below is the database table that was created to represent users in our system.

user_db_table
`id` (INTEGER)
`is_admin` (BOOLEAN)
`e_id` (VARCHAR)
`full_name` (VARCHAR)

Bad Schema Design

query {
  UserDbTable {
    id
    full_name
    e_id
    is_admin
  }
}

Good Schema Design

query {
  Employee {
    id
    fullName
    employeeId
  }

  Admin {
    id
    fullName
  }
}

By splitting the table into 2 concepts, we have correctly encapsulated the behaviours and fields for each type of User. Instead of passing an argument is_admin every time we want to fetch users, we can easily interact with Employees and Admins when implementing features.

Additionally, we have abstracted away the e_id column into a descriptive field that is only associated to an Employee. This prevents confusion around what the field is and also indicates to our engineers that this field is only used for Employees.

Think carefully about what you want to expose

“It is easier to add a field than it is to remove a field” — Rule #4 Shopify Schema standards

It is always better to hand craft a schema to ensure you are creating a usable API. Tools that generate a schema from a database are tempting but should be avoided as they act as a thin middleware that does not add any product value to our underlying structure. Think carefully before you add a field or entity, the more we expose, the less focussed a schema becomes and the more confusing it becomes for our engineers.

Michael Watson suggests slowly evolving your graph based on the clients use case in ‘The Do’s and Don’ts for your schema and GraphQL operations’. The main takeaway here is to ensure your resolvers and each field has a use case before you add it to your schema.

Simplify complex structures

While designing and scaling databases we often end up with a spider-web of tables and relationships from multiple different data sources. These structures are needed for optimisation and implementation of complicated features.

However, this leads to headaches for new engineers and confused product managers. GraphQL lets us abstract the underlying architecture into a friendly API that engineers can understand, interact and implement features with sooner.

A good rule to follow is that a resolver should not expose the underlying data source and should reflect a single concept in the product.

Aggregate fields on the server side

Where possible, perform complex calculations server side and expose them as a value within the product. This helps us reuse the logic, avoids “client consumers having to manipulate the data” (Apollo) and simplifies the cognitive load on the front end developer.

Example

In our HR company, leave requests are considered accepted when they meet a range of criteria: An admin has manually approved it, the type of leave is medical or the date requested is more than 6 months in the future.

Below is what a typical GraphQL response would look like.

Bad Output Design

{
  "query": {
    "LeaveRequests": [{
      "id": 1,
      "type": "medical",
      "approvedBy": {
        "id": 2,
        "fullName": "Joe Bloggs"
      },
      "startDate": "2021-09-09",
      "endDate": "2021-09-13"
    }]
  }
}

The problem here is that we have to calculate whether a Leave Request is approved every time we use this in our UI — a business rule should live in a central place, not scattered throughout the code base. This adds to the complexity and each engineer must understand the exact conditions in which a leave request becomes approved.

Good Output Design

{
  "query": {
    "LeaveRequests": [{
      "id": 1,
      "isApproved": true
    }]
  }
}

We now reuse this logic and it is much simpler to interact with a single isApproved field.

Note — if your product also exposes the individual fields to the users, supply these in your schema.

Create a schema that can be configured easily

“Build APIs that stand the test of time” — Github

The key message here is to design a schema that lets you easily add features and seamlessly deprecate areas of your product. This is especially important within agile development. We need to iterate quickly around the customer needs and deliver value as soon as possible.

Concrete database structures are difficult to iterate as they require complicated migrations. Luckily, GraphQL schemas are more malleable and when designed correctly are simple to configure.

ALWAYS use a single input object for a mutation

This makes it much easier to add fields and also makes it easier to deprecate fields.

For this example, we will create a mutation that lets an Employee request a LeaveRequest.

Bad Output Design

mutation {
  LeaveRequests {
    request(type: 'holiday', description: 'Going to Bali', startDate: '2018-09-09', endDate: '2018-09-14')
  }
}

This is the approach I was guilty of when first creating mutations. The problem here is that it makes it very difficult to add and remove functionality to the mutation.

If we wanted to add functionality such as an image attachment, the only way we could achieve this is by adding a 5th argument. This clearly does not scale well and is difficult to maintain.

Using a single argument also makes execution easier on the client-side.

Good Output Design

mutation {
  LeaveRequests {
    request(input: {
      type: 'holiday',
      description: 'Going to Bali',
      startDate: '2020-09-09',
      endDate: '2020-09-12',
    })
  }
}

This pattern allows us to add fields to the input object without introducing any breaking changes.

If we wanted to deprecate the description field, we could make the field Nullable, add a deprecation reason and phase it out of our front-end code.

You will always want to return a single output object, not a value

“ When working with mutations it is considered good design to return mutated records as a result of the mutation. This allows us to update the state on the frontend accordingly and keep things consistent” — Atheros

For similar reasons, by returning an object in the response, it is easier to add and remove fields to that response object. Our clients can continue using these endpoints without breaking changes when we want to extend functionality and add new fields to the response.

Using the above example, we will also return the new Leave Request so we can update our UI without having to perform a second request.

Good Output Design

mutation($input: LeaveRequest!) {
  LeaveRequests {
    request(input: $input) {
      id: 12
      description: 'Going to Bali',
      type: 'holiday',
      startDate: '2020-09-09',
      endDate: '2020-09-12',
      isApproved: false,
  }
}

Map mutations in GraphQL to a specific user flow

Mapping our mutations to actions in our product ensures that we create smaller and more focussed requests.

“Avoid trying to build ‘One Size Fits All’ API that supports mobile, desktop and all features. Embrace different use cases and clients and build around that.” — Github.

Why

By making smaller, intuitive actions, we make it easier to understand & reason about what a specific endpoint does
Less code breaks — if less endpoints are using the same generic mutation, we reduce the impact of our changes
Steers us towards good architectural patterns, especially ‘Single Responsibility’

Anemic Design

Anemic Design is an anti-pattern where you design your system as purely data, without any behaviour built in. Simply put, it means that when you want to change some underlying state, you interact directly with the data layer using generic create, read, update and delete methods. In Anemic Design, business rules and behaviours exist, but they live inside the engineer’s brain.

For this example, an Admin wants to approve a LeaveRequest

Bad Output Design

mutation {
  LeaveRequests {
    update(input: {
      id: 12,
      type: 'holiday',
      description: 'Going to Bali',
      startDate: '2020-09-09',
      endDate: '2020-09-12',

      // This is the only field we want to update
      isApproved: true,
    })
  }
}

Why should we avoid anemic design?

You have to send the entire payload of what you need to update or create.
Engineers need to understand the underlying data structure and the side effect of changing each field
A single mutation has to cater for lots of different use cases

Good Output Design

mutation {
  LeaveRequests {
    approve(input: {
      id: 12,
    })
  }
}

The action is now more specific, there is less room for side effects and our logic can now become more focussed inside of the resolver.

Use consistent naming conventions

Shopify Rule #9 — Choose field names based on what makes sense, not based on the implementation or what the field is called in legacy APIs.

The main takeaway here is to use a standard that works for your team and to stick with it. Consistent naming lets your team instantly understand what a specific resolver or mutation does.

A common rule throughout the industry is to user verb first, noun second.

Final Note

When building consumer software, we want an API that reflects our product. GraphQL was built for the purpose to ‘give clients the power to ask for exactly what they need’ and to make ‘it easier to evolve APIs over time’ — graphql.org.

It is time to expand our knowledge beyond principles we learnt from REST and SOAP based endpoints. In order to create a kick-ass schema, think carefully about entities, fields and actions you want to expose.

Take the time to ensure your schema is flexible and is highly coupled to your business domain. Early forethought about your schema design will make your front-end engineers jobs a lot smoother, will help new starters onboard faster and will ensure quicker iterations on features that improve the lives of your users.

Article originally posted by Adam Hannigan on Medium.com

What is a GraphQL schema?#

Product Driven#

Why?#

Example#

Bad Schema Design#

Good Schema Design#

Think carefully about what you want to expose#

Simplify complex structures#

Aggregate fields on the server side#

Example#

Bad Output Design#

Good Output Design#

Create a schema that can be configured easily#

ALWAYS use a single input object for a mutation#

Bad Output Design#

Good Output Design#

You will always want to return a single output object, not a value#

Good Output Design#

Map mutations in GraphQL to a specific user flow#

Why#

Anemic Design#

Bad Output Design#

Good Output Design#

Use consistent naming conventions#

Final Note#

What is a GraphQL schema?

Product Driven

Why?

Example

Bad Schema Design

Good Schema Design

Think carefully about what you want to expose

Simplify complex structures

Aggregate fields on the server side

Example

Bad Output Design

Good Output Design

Create a schema that can be configured easily

ALWAYS use a single input object for a mutation

Bad Output Design

Good Output Design

You will always want to return a single output object, not a value

Good Output Design

Map mutations in GraphQL to a specific user flow

Why

Anemic Design

Bad Output Design

Good Output Design

Use consistent naming conventions

Final Note