Microservices and the N+1 Problem

Jul 22, 2019

Tags: API, Microservices, Patterns, Performance, RESTful API

Microservices and the N+1 Problem

One of the big tenants of microservices is that a service owns its data and the only way to access that data is through the service interface. Because of this, one of the first hurdles that any serious system needs to deal with is how to combine data across two (or more) services since you can’t simply handle that by way of a database join.

The Rise of the N+1 Problem

The N+1 problem is caused when an application in making N+1 queries for data: 1 for querying data from one table and N queries for data in a related table. Suppose you have a Customers table and an Orders table, where Orders has a CustomerId foreign key. It would be easy to get the Orders placed by each customer:

       SELECT * FROM Orders o JOIN Customer c ON c.CustomerId = o.CustomerId

Alternatively, you could do this in a loop by first selecting from Customers and then iterating over that dataset, getting the CustomerId, and selecting the Order details for each Customer. It might look something like this in C#:

       var customers = GetCustomers();

       foreach (var c in customers)
       {
          var orders = GetOrders(c.CustomerId);
          // …
       }

Not a great implementation, but what’s worse is that at a glance, it doesn’t look too bad. Even from a database perspective, each of those queries in isolation are likely to be well performing, but you are still making an additional N roundtrips to the database for something that could have been solved with a single call.

This problem was made even more prevalent with the rise of ORMs, relationships between entities, and the concept of lazy loading. In that case, the code would look similar and it might not be obvious what is happening behind the scenes because the ORM is handling when data is loaded:

       var customers = dbContext.Customers.ToList();

       foreach (var c in customers)
       {
          var orders = c.Orders.ToList();
          // …
       }

The solution to this problem, in both cases, is to eagerly load dependent relationships before iterating them.

API Composition and N+1

It should be obvious how the N+1 problem has resurfaced with the advent of microservices. Because you lose the capability for joining across and eagerly loading data that crosses services boundaries, we’ve been told that we need to handle this by way of API Composition — in which some service makes calls to several other services and handles a “join” in-memory across the data returned by each service. Looks familiar doesn’t it?

       var customers = CustomerService.All();

       foreach (var c in customers)
       {
          var orders = OrderService.Get(c.CustomerId);
          // …
       }

It definitely should, because it’s the exact same situation as before, except now you bear some overhead in the form of network latency. You’ve done everything right. Now what?

Command Query Responsibility Segregation to the Rescue

An alternative to API Composition is to introduce the Command Query Responsibility Segregation (CQRS) pattern, in which a write model is updated and then an event is propagated to interested services so they can populate read-only replicas of data. This allows your service to perform those joins to support queries against the read replica.

Problem solved, right? Well, yes and no. Your service is responsible for updating its own data — not the other read replicas that may be dependent on a change. The read replicas are updated by some event mechanism, and handling those events is subject to some propagation delay, introducing the concept of eventual consistency – the idea that at some point, when there are no more updates to a given model, all the events that advertise that change will be consumed and read replicas updated, is when we achieve consistency.

Eventual consistency is a pretty common idea today. In many cases, reading stale data is going to have little or no impact to your system; however, in cases where you cannot tolerate it, there are mechanisms to force strong consistency when it’s absolutely necessary. For example, in certain specific instances, you could have updates to both the write model and read replica be a blocking operation – which in turn introduces other potential pitfalls in terms of availability.

As with everything, there are no silver bullets – there are trade-offs to both CQRS and strong consistency, and they are both probably best used sparingly, when it’s absolutely necessary.

With these tools in your inventory, you are equipped to handle one of the problems that we thought we had solved long ago that has resurfaced with the popularity of microservices: the N+1 problem.