The infamous N+1 problem is something that I have yet to see discussed in the Magento development circles and that worries me some since the popularity of Magento has drawn in quite a few “green” developers. These queries are particularly insidious because they “work”, but can bring a site to a slow crawl. Where are they especially dangerous? As you’ve probably guessed, the problem really becomes obvious when large numbers of records need to be processed–such as batch report jobs.

The N+1 query problem is typically discussed in regards to database object relational mapping (ORM) with one-to-many or many-to-many foreign relations. Traditionally an ORM will default to using “lazy loading” to populate collections of related objects. This means that the related objects aren’t actually populated until data is requested from each record.

To make it easier to picture, I will use the database models below as an example.

So with a traditional ORM like PHP ActiveRecord you might see code like this:

This code seems nice and simple, but what is actually happening under the covers? First it runs:

SELECT * FROM registries LIMIT 1;

Then for each associated registry item it loads data as it loops by running:

SELECT * FROM registry_items where registry_id=?;

If a registry had 50 items that means there would be 51 queries to retrieve data for that particular registry. Hence the N+1. Well that’s kinda crappy right? Fortunately PHP ActiveRecord has an option to include or “eager load” specific associations. This would reduce the number of queries in the preceding scenario to 2.

SELECT * FROM registries LIMIT 1;
SELECT * FROM registry_items WHERE registry_id in (1);

At this point you’re probably saying “well that’s nice, but what does this have to do with Magento?” That is a good question especially since Magento’s ORM doesn’t provide management for one-to-many or many-to-many relationships. For the most part when using Magento’s Collection models, the number of queries will be kept to a minimum. For instance if we wanted to print the names of all the products on a registry we might whip out the following code:

 

But is this optimal? Well let’s see, we run 1 query to fetch the registry and n queries(one for each item) to load the associated product. Hmm n+1…not good. Now those in the know realize there are more than n queries running when loading an EAV model such as a catalog product, but the point is we are running many more than necessary. So what can we do?

Joins are your friend!

Don’t be afraid to join a table if possible to get the data you need. If we were able to join to a flat catalog product table the number of queries for the example above could be reduced to 2. But as of writing this, I am not aware of a “nonhackish” way of doing so.

Examine similar models in the Magento source code!

I was able to consider the core wishlist model to find a solution for my registry items problem. The Magento team implemented the _afterLoad() method on the item collection which calls a method _assignProducts() that collects the associated product IDs into an array and passes that to the addIdFilter on product collection.

This method would reduce the number of queries by retrieving data for all of the associated products at the same time.

To sum up this lengthy post, the point I’m trying to convey is to think about the SQL that your Magento module is generating and establish strategies for dealing with relational objects. As a general rule the fewer times you have to query the database the better. In the eCommerce world site performance has a direct impact on sales performance, so it’s critical for us developers to keep performance in mind while writing code.