Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EF Core creates SQL query with ORDER BY at the end massively impacting performance for cases with many included collections #27026

Closed
desdoades opened this issue Dec 16, 2021 · 3 comments

Comments

@desdoades
Copy link

I know that since EF Core 3.0 a query with multiple included collections (via .Include(x => x.my_collection)) is fetched via a single SQL query which can result in a huge result set containing duplicate information due to many join operations (i.e. cartesian explosion).

I know that since EF Core 5.0 we can use .AsSplitQuery() or even activate split queries globally. However as I understand it this poses potential risks and should only be used when necessary.

In my concrete case I want to retrieve a single object out of the database with .SingleOrDefault(x => x.id == ...). The object has many included collections, some are included via a .ThenInclude(...).
The C# looks like this (names changed):

MyObject myObject = context.MyObject 
	.Include(x => x.single_item_1)
	.Include(x => x.single_item_2).ThenInclude(x => x.single_item_3).ThenInclude(x => x.single_item_4)
	.Include(x => x.single_item_2).ThenInclude(x => x.single_item_3).ThenInclude(x => x.collection_1).ThenInclude(x => x.single_item_5)
	.Include(x => x.single_item_2).ThenInclude(x => x.single_item_6)
	.Include(x => x.single_item_2).ThenInclude(x => x.single_item_7)
	.Include(x => x.single_item_2).ThenInclude(x => x.single_item_8)
	.Include(x => x.single_item_2).ThenInclude(x => x.collection_2).ThenInclude(x => x.single_item_9)
	.Include(x => x.single_item_2).ThenInclude(x => x.collection_2).ThenInclude(x => x.single_item_10)
	.Include(x => x.single_item_2).ThenInclude(x => x.collection_3)
	.Include(x => x.single_item_2).ThenInclude(x => x.collection_4)
	.Include(x => x.single_item_12)
	.Include(x => x.collection_5).ThenInclude(x => x.single_item_13)
	.Include(x => x.collection_6).ThenInclude(x => x.single_item_14)
	.Include(x => x.collection_6).ThenInclude(x => x.collection_7)
	.Include(x => x.collection_8).ThenInclude(x => x.single_item_15)
	.SingleOrDefault(x => x.uuid == uuid);

For one rather bad case I get a result set back with ~40k rows and ~100 columns because of all the joins. The SQL query takes ~15 seconds.
The interesting thing is that upon benchmarking this query in pgAdmin I found that >95% of those 15 seconds seems to be used on the last line of the query, which is an ORDER BY.
The last line looks like this:

ORDER BY t0.id, t0.id0, t0.id1, t0.id2, t0.id3, t0.id4, t0.id5, t0.id6, t0.id7, t1.id, t1.id0, t2.blobsid, t2.claim_statesid, t2.id, t2.id0, t2.id1, c2.id, c3.id, t3.id, t3.id0, t5.id, t5.id0, t5.id1, t6.id, t6.id0

So my question is why is an ORDER BY in this query when my C# code does not call for it? (The C# code contains no ordering operation and also just wants a single element by id).
I can only guess this is for EF Core to easier translate the result of the query to CLR objects but when this helper blows up SQL query time ~twentyfold I doubt its worth it in this case.
Can I somehow prevent the generation of the ORDER BY? If not I will have to resort to split queries.

EF Core version: 5.0.5
Database provider: Npgsql 5.0.5.1
Database: PostgreSQL 12
Target framework: NET 5.0
Operating system: Windows 10

If this question belongs in the npgsql repository just give me a notice and I will create it there.
If something more is requested I will try to add it.

@desdoades desdoades changed the title EF Core creates SQL query with ORDER BY at the end massively impacting performance for cases of cartesian explosion EF Core creates SQL query with ORDER BY at the end massively impacting performance for cases with many included collections Dec 16, 2021
@roji
Copy link
Member

roji commented Dec 16, 2021

Duplicate of #19571

@roji roji marked this as a duplicate of #19571 Dec 16, 2021
@roji
Copy link
Member

roji commented Dec 16, 2021

@desdoades the orderings are necessary for EF Core to properly load related entities (#19571), see also #18022. #19828 removed the last ordering for 6.0, but the others are necessary. You can indeed use split query for this; given the number of includes in your query above, that looks like a good idea even regardless of the ordering.

@desdoades
Copy link
Author

Thank you for the quick answer and excuse me for creating a duplicate. I used search on the issues but did not find it.
I will read your links and then use query splitting.

@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants