Hitting the N+1 Query Problem
Seems like a win-win scenario, right? And it is, mostly. However, this approach shifts some of the complexity of gathering data back to the server side and, if we’re not careful, we might end up firing several queries for each fetched by the client. We have found during our Ruby on Rails projects that this is particularly common because queries are executed lazily.
Setup
For our purposes,we’ll assume that you’re familiar with Ruby on Rails, have a basic understanding of GraphQL, and have set up the graphql-ruby gem on your application.
We’ll be using Shopify’s graphql-batch to load records efficiently. Let’s start by adding the following line to your gemfile:
gem 'graphql-batch'
Now add the line use GraphQL::Batch to your AppSchema.rb:
class BlogSchema < GraphQL::Schema
query QueryType
mutation MutationType
use GraphQL::Batch
end
We’re now ready to start going through some common scenarios where you might encounter an N+1 problem, and we’ll go through how to optimize it.
Loading a belongs_to association
Let’s consider a typical Blog application:
class Post < ApplicationRecord belongs_to :authorend
class Author < ApplicationRecord has_many :postsend
Now imagine that our blog has an index page where we show all the posts with their respective authors, and our client is fetching this data with the following Query:
posts{
id
title
description
author {
id
name
}
}
This will result in the following queries being performed, before we apply any optimization:
SELECT "posts".* FROM "posts"SELECT "authors".* FROM "authors" WHERE "authors"."id" = $1
SELECT "authors".* FROM "authors" WHERE "authors"."id" = $1
SELECT "authors".* FROM "authors" WHERE "authors"."id" = $1
SELECT "authors".* FROM "authors" WHERE "authors"."id" = $1
We can clearly see that for every Post loaded, there’s an additional query being performed to retrieve its author, thus raising an N+1 Query Problem. We can solve this with the help of graphql-batch by modifying the PostType class:
module Types
class PostType < Types::BaseObject
field :id, ID
field :title, String
field :description, String
field :content, String
field :author, AuthorType
def author
RecordLoader.for(Author).(object.author_id)
end
end
end
If we run the previous query one more time, we’ll see the output changed to:
SELECT "posts".* FROM "posts"
SELECT "authors".* FROM "authors" WHERE "authors"."id" IN ($1, $2, $3)
Loading a has_many association
Now let’s consider the reverse scenario, where we might have a page that displays the different authors whose children are all the posts each has written:
authors {
id
name
posts {
id
title
description
}
}
As in the previous example, this would cause an additional Posts query for each author loaded. Luckily, we can fix it using an AssociationLoader:
class AuthorType < Types::BaseObject
field :id, ID, null: false
field :name, String, null: false
field :posts, [PostType], null: false
def posts
AssociationLoader.for(object.class, :posts).load(object)
end
end
Finally let’s include the AssociationLoader class in our project:
class AssociationLoader < GraphQL::Batch::Loader
def self.validate(model, association_name)
new(model, association_name)
nil
end
def initialize(model, association_name)
@model = model
@association_name = association_name
validate
end
def tload(record)
unless record.is_a?(@model)
raise TypeError,
"#{@model} loader can't load association for #{record.class}"
end
return Promise.resolve(read_association(record)) if association_loaded?(record)
super
end
# We want to load the associations on all records, even if they have the same id
def cache_key(record)
record.object_id
end
def perform(records)
preload_association(records)
records.each { |record| fulfill(record, read_association(record)) }
end
private
def validate
raise ArgumentError, "No association #{@association_name} on #{@model}" unless @model.reflect_on_association(@association_name)
end
def preload_association(records)
::ActiveRecord::Associations::Preloader.new.preload(
records,
@association_name
)
end
def read_association(record)
record.public_send(@association_name)
end
def association_loaded?(record)
record.association(@association_name).loaded?
end
end
And now we can see that our posts are being batched together:
SELECT "authors".* FROM "authors"
SELECT "posts".* FROM "posts" WHERE "posts"."author_id" IN ($1, $2, $3) [["author_id", 1], ["author_id", 2], ["author_id", 3]]
Loading a has_many through association
Let’s enrich our Blog application by allowing Posts to have multiple tags. We’d model that relationship the following way:
class Post < ApplicationRecord
belongs_to :author
has_many :taggings
has_many :tags, through: :taggings
end
The interesting part about this scenario is that we don’t actually care to expose the Taggings part of the relationship to our clients, since all they care about is the actual Tags. The Tagging is just an implementation detail, but we still somehow need to go through it to get the data the client is asking for. Rails makes it extremely easy to achieve this, but we end up with yet another N+1 scenario:
SELECT "posts".* FROM "posts"
SELECT "tags".* FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."category_id" WHERE "taggings"."post_id" = $1 [["post_id", 1]]
SELECT "tags".* FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."category_id" WHERE "taggings"."post_id" = $1 [["post_id", 3]]
SELECT "tags".* FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."category_id" WHERE "taggings"."post_id" = $1 [["post_id", 4]]
SELECT "tags".* FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."category_id" WHERE "taggings"."post_id" = $1 [["post_id", 2]]
We can see that Rails is smart enough to do the join between our tables, but we can still do even better and batch them all together by using (once again) the association loader:
class PostType < Types::BaseObject
# no need to expose the taggings
field :tags, [TagType]
def tags
AssociationLoader.for(object.class, :tags).load(object)
end
end
SELECT "posts".* FROM "posts"
SELECT "taggings".* FROM "taggings" WHERE "taggings"."post_id" IN ($1, $2, $3, $4) [["post_id", 1], ["post_id", 3], ["post_id", 4], ["post_id", 2]]
SELECT "categories".* FROM "categories" WHERE "categories"."id" IN ($1, $2) [["id", 2], ["id", 1]]
Loading dependent associations
Sometimes you’ll arrive at a scenario where you need to load a record that depends on the result of a previous query. Luckily, graphql-batch provides us with a mechanism to resolve lazy loading operations via a Promise API.
Continuing with our Blog application example, let's assume that Authors have an Image associated with their accounts and that we expose a query that returns said image given a post id.
Using our RecordLoader class from previous examples, we can chain together the Post and Image queries in a way that both will be executed lazily:
def author_image(post_id:)
RecordLoader.for(Post).load(post_id).then do |post|
RecordLoader.for(Image).load(post.image_id)
end
end
Summary
We’ve addressed the most common scenarios where you’d run into N+1 queries on a GraphQL endpoint in Rails. These examples demonstrate how batching is a powerful tool to address these types of problems and the possibilities that custom loaders uncover, such as having loaders for ActiveStorage attachments, Cache loaders that avoid hitting the database and much more.
A word of caution
When addressing performance issues and optimizations it’s good to remember the trade-offs and avoid early optimizations. These techniques, while powerful, do introduce complexity into your code base. To properly optimize, always measure things before and after introducing new techniques to make sure that you’re on the right track.