myapp.tests.test_models module¶
-
class
myapp.tests.test_models.
ManyToManyTestCase
(methodName='runTest')[source]¶ Bases:
django.test.testcases.TestCase
do some performance tests on large numbers of many-to-many relationships
Try to understand why prefetch_related does a giant WHERE IN (…) which doesn’t scale well as it requires a parameter for each row of the intermediate myapp_instructor_course_terms table. This breaks databases like MS SQLServer that has a limit of 2100 parameters (which is huge but fewer than mySQL supports).
I think this is actually a _feature_ and is based on using prefetch with paginated queries.
However, for a related child, prefetching that Model can break the database if it is not paginated and has a large number of objects. This is an `issue<https://github.com/django-json-api/django-rest-framework-json-api/issues/178>`_ raised for DJA that is worked around by `skipping the related data<https://github.com/django-json-api/django-rest-framework-json-api/pull/445>`_, and leaving only the relationship hyperlink in place.
According to the JSON:API spec: “A relationship object that represents a to-many relationship MAY also contain pagination links under the links member, as described below. Any pagination links in a relationship object MUST paginate the relationship data, not the related resources.” – https://jsonapi.org/format/#document-resource-object-relationships
First it gets the list of instructors:
- SELECT [myapp_instructor].[id], [myapp_instructor].[effective_start_date], …
FROM [myapp_instructor] ORDER BY [myapp_instructor].[id] ASC
Then uses the ids in a WHERE IN list:
- SELECT ([myapp_instructor_course_terms].[instructor_id])
AS [_prefetch_related_val_instructor_id], [myapp_courseterm].[id], …
- FROM [myapp_courseterm] INNER JOIN [myapp_instructor_course_terms]
ON ([myapp_courseterm].[id] = [myapp_instructor_course_terms].[courseterm_id])
WHERE [myapp_instructor_course_terms].[instructor_id] IN (%s, %s, …)
But if you paginate the query (page size 10), then a ‘SELECT TOP 10 …’ or ‘SELECT … OFFSET 10 FETCH FIRST 10 ROWS ONLY’ happens
To test this with different database engines, set DJANGO_MYSQL=true or DJANGO_SQLSERVER=true (see settings)
Right now the SQL database debug messages just go to the console. TODO: Find a way to capture the SQL queries here in the test case so we can do something with them.
See also https://medium.com/@hansonkd/performance-problems-in-the-django-orm-1f62b3d04785 https://docs.djangoproject.com/en/2.1/ref/models/querysets/#django.db.models.Prefetch
- Returns