myapp.tests.test_models module

class myapp.tests.test_models.ManyToManyTestCase(methodName='runTest')[source]

Bases: django.test.testcases.TestCase

do some performance tests on large numbers of many-to-many relationships

Try to understand why prefetch_related does a giant WHERE IN (…) which doesn’t scale well as it requires a parameter for each row of the intermediate myapp_instructor_course_terms table. This breaks databases like MS SQLServer that has a limit of 2100 parameters (which is huge but fewer than mySQL supports).

I think this is actually a _feature_ and is based on using prefetch with paginated queries.

However, for a related child, prefetching that Model can break the database if it is not paginated and has a large number of objects. This is an `issue<https://github.com/django-json-api/django-rest-framework-json-api/issues/178>`_ raised for DJA that is worked around by `skipping the related data<https://github.com/django-json-api/django-rest-framework-json-api/pull/445>`_, and leaving only the relationship hyperlink in place.

According to the JSON:API spec: “A relationship object that represents a to-many relationship MAY also contain pagination links under the links member, as described below. Any pagination links in a relationship object MUST paginate the relationship data, not the related resources.” – https://jsonapi.org/format/#document-resource-object-relationships

First it gets the list of instructors:

SELECT [myapp_instructor].[id], [myapp_instructor].[effective_start_date], …

FROM [myapp_instructor] ORDER BY [myapp_instructor].[id] ASC

Then uses the ids in a WHERE IN list:

SELECT ([myapp_instructor_course_terms].[instructor_id])

AS [_prefetch_related_val_instructor_id], [myapp_courseterm].[id], …

FROM [myapp_courseterm] INNER JOIN [myapp_instructor_course_terms]

ON ([myapp_courseterm].[id] = [myapp_instructor_course_terms].[courseterm_id])

WHERE [myapp_instructor_course_terms].[instructor_id] IN (%s, %s, …)

But if you paginate the query (page size 10), then a ‘SELECT TOP 10 …’ or ‘SELECT … OFFSET 10 FETCH FIRST 10 ROWS ONLY’ happens

To test this with different database engines, set DJANGO_MYSQL=true or DJANGO_SQLSERVER=true (see settings)

Right now the SQL database debug messages just go to the console. TODO: Find a way to capture the SQL queries here in the test case so we can do something with them.

See also https://medium.com/@hansonkd/performance-problems-in-the-django-orm-1f62b3d04785 https://docs.djangoproject.com/en/2.1/ref/models/querysets/#django.db.models.Prefetch

Returns