Optimizing GraphQL N+1 query problems in Ruby on Rails

Written by
Written by

Here at FullStack Labs, we use GraphQL, an amazingly powerful tool, to establish a common language between teams of back-end developers and front-end developers. It allows teams to focus on what the underlying domain model is instead of how to expose it to clients.

Front end applications can now declare what data they need, and in what shape, and trust that the server will deliver this data and avoid waterfall scenarios where multiple queries are needed to render a single screen. 

Hitting the N+1 Query Problem

Seems like a win-win scenario, right? And it is, mostly. However, this approach shifts some of the complexity of gathering data back to the server side and, if we’re not careful, we might end up firing several queries for each fetched by the client. We have found during our Ruby on Rails projects that this is particularly common because queries are executed lazily.

Setup

For our purposes,we’ll assume that you’re familiar with Ruby on Rails, have a basic understanding of GraphQL, and have set up the graphql-ruby gem on your application. 

We’ll be using Shopify’s graphql-batch to load records efficiently. Let’s start by adding the following line to your gemfile:

gem 'graphql-batch'

Now add the line use GraphQL::Batch to your AppSchema.rb:

class BlogSchema < GraphQL::Schema
  query QueryType
  mutation MutationType

  use GraphQL::Batch
end

We’re now ready to start going through some common scenarios where you might encounter an N+1 problem, and we’ll go through how to optimize it.

Loading a belongs_to association

Let’s consider a typical Blog application:

class Post < ApplicationRecord  belongs_to :authorend
class Author < ApplicationRecord  has_many :postsend

Now imagine that our blog has an index page where we show all the posts with their respective authors, and our client is fetching this data with the following Query:

posts{
    id
    title
    description
    author {
      id
      name
    }
  }

This will result in the following queries being performed, before we apply any optimization:

SELECT "posts".* FROM "posts"SELECT "authors".* FROM "authors" WHERE "authors"."id" = $1
SELECT "authors".* FROM "authors" WHERE "authors"."id" = $1
SELECT "authors".* FROM "authors" WHERE "authors"."id" = $1
SELECT "authors".* FROM "authors" WHERE "authors"."id" = $1

We can clearly see that for every Post loaded, there’s an additional query being performed to retrieve its author, thus raising an N+1 Query Problem. We can solve this with the help of graphql-batch by modifying the PostType class:

module Types
 class PostType < Types::BaseObject
   field :id, ID
   field :title, String
   field :description, String
   field :content, String
   field :author, AuthorType

   def author
     RecordLoader.for(Author).(object.author_id)
   end
 end
end

If we run the previous query one more time, we’ll see the output changed to:

SELECT "posts".* FROM "posts"
SELECT "authors".* FROM "authors" WHERE "authors"."id" IN ($1, $2, $3)

Loading a has_many association

Now let’s consider the reverse scenario, where we might have a page that displays  the different authors whose children are all the posts each has written:

authors {
  id
  name
  posts {
    id
    title
    description
  }
}

As in the previous example, this would cause an additional Posts query for each author loaded. Luckily, we can fix it using an AssociationLoader:

class AuthorType < Types::BaseObject
   field :id, ID, null: false
   field :name, String, null: false
   field :posts, [PostType], null: false

   def posts
     AssociationLoader.for(object.class, :posts).load(object)
   end
 end

Finally let’s include the AssociationLoader class in our project:

class AssociationLoader < GraphQL::Batch::Loader
 def self.validate(model, association_name)
   new(model, association_name)
   nil
 end

 def initialize(model, association_name)
   @model = model
   @association_name = association_name
   validate
 end

 def tload(record)
   unless record.is_a?(@model)
     raise TypeError,
           "#{@model} loader can't load association for #{record.class}"
   end
   return Promise.resolve(read_association(record)) if association_loaded?(record)

   super
 end

 # We want to load the associations on all records, even if they have the same id
 def cache_key(record)
   record.object_id
 end

 def perform(records)
   preload_association(records)
   records.each { |record| fulfill(record, read_association(record)) }
 end

 private

 def validate
   raise ArgumentError, "No association #{@association_name} on #{@model}" unless @model.reflect_on_association(@association_name)
 end

 def preload_association(records)
   ::ActiveRecord::Associations::Preloader.new.preload(
     records,
     @association_name
   )
 end

 def read_association(record)
   record.public_send(@association_name)
 end

 def association_loaded?(record)
   record.association(@association_name).loaded?
 end
end

And now we can see that our posts are being batched together:

SELECT "authors".* FROM "authors"
SELECT "posts".* FROM "posts" WHERE "posts"."author_id" IN ($1, $2, $3)  [["author_id", 1], ["author_id", 2], ["author_id", 3]]

Loading a has_many through association

Let’s enrich our Blog application by allowing Posts to have multiple tags. We’d model that relationship the following way:

class Post < ApplicationRecord
 belongs_to :author
 has_many :taggings
 has_many :tags, through: :taggings
end

The interesting part about this scenario is that we don’t actually care to expose the Taggings part of the relationship to our clients, since all they care about is the actual Tags. The Tagging is just an implementation detail, but we still somehow need to go through it to get the data the client is asking for. Rails makes it extremely easy to achieve this, but we end up with yet another N+1 scenario:

SELECT "posts".* FROM "posts"
SELECT "tags".* FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."category_id" WHERE "taggings"."post_id" = $1  [["post_id", 1]]
SELECT "tags".* FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."category_id" WHERE "taggings"."post_id" = $1  [["post_id", 3]]
SELECT "tags".* FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."category_id" WHERE "taggings"."post_id" = $1  [["post_id", 4]]
SELECT "tags".* FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."category_id" WHERE "taggings"."post_id" = $1  [["post_id", 2]]

We can see that Rails is smart enough to do the join between our tables, but we can still do even better and batch them all together by using (once again) the association loader:

class PostType < Types::BaseObject
 # no need to expose the taggings
 field :tags, [TagType]

 def tags
   AssociationLoader.for(object.class, :tags).load(object)
 end
end
SELECT "posts".* FROM "posts"
 SELECT "taggings".* FROM "taggings" WHERE "taggings"."post_id" IN ($1, $2, $3, $4)  [["post_id", 1], ["post_id", 3], ["post_id", 4], ["post_id", 2]]
 SELECT "categories".* FROM "categories" WHERE "categories"."id" IN ($1, $2)  [["id", 2], ["id", 1]]

Loading dependent associations

Sometimes you’ll arrive at a scenario where you need to load a record that depends on the result of a previous query. Luckily, graphql-batch provides us with a mechanism to resolve lazy loading operations via a Promise API. 

Continuing with our Blog application example, let's assume that Authors have an Image associated with their accounts and that we expose a query that returns said image given a post id.

Using our RecordLoader class from previous examples, we can chain together the Post and Image queries in a way that both will be executed lazily:

def author_image(post_id:)
  RecordLoader.for(Post).load(post_id).then do |post|
    RecordLoader.for(Image).load(post.image_id)
  end
end

Summary

We’ve addressed the most common scenarios where you’d run into N+1 queries on a GraphQL endpoint in Rails. These examples demonstrate how batching is a powerful tool to address these types of problems and the possibilities that custom loaders uncover, such as having loaders for ActiveStorage attachments, Cache loaders that avoid hitting the database and much more.

A word of caution

When addressing performance issues and optimizations it’s good to remember the trade-offs and avoid early optimizations. These techniques, while powerful, do introduce complexity into your code base. To properly optimize, always measure things before and after introducing new techniques to make sure that you’re on the right track.

Frequently Asked Questions