Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Selecting scalar properties alongwith entity fails to lift and causes N+1 eval #11186

Closed
alexk8 opened this issue Mar 8, 2018 · 10 comments
Assignees
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@alexk8
Copy link

alexk8 commented Mar 8, 2018

//With classic Parent-Children (like in issue #10472)
This works well (within single select):

ctx.Parents.Select(p => new
	{
		ParentId = p.Id,
		ChildrenCount = p.Children.Count(),
		ChildrenSumXX = p.Children.Sum(c=>c.XX)
	}).ToList()

But this works as 2N+1 queries:

ctx.Parents.Select(p => new
	{
		Parent = p,
		ChildrenCount = p.Children.Count(),
		ChildrenSumXX = p.Children.Sum(c=>c.XX)
	}).ToList()

That's a bug? Or what's the problem, I cannot see any difference...

That's in 2.1.0-preview1

@smitpatel
Copy link
Member

The difference is
in first query, individual property (-ies) are being selected from entity rather than whole entity as in second query. Selecting whole entity requires us to use materialization code path.
At present materialization is tied closely with shaper and there would be one shaper for each query source, we cannot do materialization and some other scalar value processing both on single query source (in this case p). Future enhancement.

As a work-around you can construct object Parent manually by selecting individual properties and avoid materialization code path.

@alexk8
Copy link
Author

alexk8 commented Mar 9, 2018

It looks like just a technical imperfections at first, but I think it should be fixed with a high priority, because these queries are commonly used here and there.
In order to materialize it easily, query should look like just this

   SELECT 
     [anonymType1.p].[id] AS [anonymType1.p.id], 
     [anonymType1.p].[someproperty1] AS [anonymType1.p.someproperty1], 
     [anonymType1.p].[someproperty2] AS [anonymType1.p.someproperty2], 
     
	  (
          SELECT COUNT(*)
          FROM [children] AS [c]
          WHERE [anonymType1.p].[id] = [c].[parentid]
      ) AS [anonymType1.cnt], 
	  (
          SELECT SUM([c0].[XX])
          FROM [children] AS [c0]
          WHERE [anonymType1.p].[id] = [c0].[parentid]
      ) AS [anonymType1.sum]
      FROM [parents] AS [anonymType1.p]

@divega
Copy link
Contributor

divega commented Apr 17, 2018

@smitpatel Seems like the title should be improved.

@smitpatel smitpatel changed the title sum() and count() subquery translation = N+1 DB trips Query: Selecting scalar properties alongwith entity fails to lift and causes N+1 eval Apr 17, 2018
@ghost
Copy link

ghost commented Apr 24, 2018

Can this be added to the 2.1 milestone?

@divega
Copy link
Contributor

divega commented Oct 24, 2018

Clearing up the milestone to discuss taking these scenarios into account in the query design for 3.0. We are already overbooked for 3.0 so we may not be able to adress it, however I would like if possible that the design doesn't prevent us from being able to lift in this case.

@martin-hronsky
Copy link

I had similar issue with following query (MIN was unexpectedly submited to SQL server as separate query):

var query = from r in insuredRisks
                        where r.PolicyId == insuredRisk.PolicyId &&
                              r.VersionNo == insuredRisk.PolicyVersionNo &&
                              r.InsuredRiskNo == insuredRisk.InsuredRiskNo
                        select new InsuredRiskData
                                   {
                                       BasicInsuredRiskData = new BasicInsuredRiskData
                                                                  {
                                                                      InsuredRisk = r,
                                                                      InceptionDate = insuredRisks
                                                                      .Where(ir => ir.PolicyId == insuredRisk.PolicyId &&
                                                                          ir.InsuredRiskNo == insuredRisk.InsuredRiskNo &&
                                                                          ir.CreationCommissionLineNo ==                                                                           r.InsuredRisk.CreationCommissionLineNo &&
                                                                          ir.RecodVersionState != "D" && ir.RecodVersionState != "X")
                                                                      .Select(ir => ir.ValidFrom)
                                                                      .Min()
                                                                  }
                                   };

and as a workaround I wrote it this way:

var query = from r in insuredRisks
                        from ri in insuredRisks.Where(ri => ri.PolicyId == r.PolicyId &&
                                                            ri.InsuredRiskNo == r.InsuredRiskNo &&
                                                            ri.CreationCommissionLineNo == r.CreationCommissionLineNo &&
                                                            ri.RecodVersionState != PolicyVersionRecordState.SoftDeleted &&
                                                            ri.RecodVersionState != PolicyVersionRecordState.Cancelled)
                                               .GroupBy(ri => ri.PolicyId, ri => ri.ValidFrom)
                                               .Select(ri => new
                                                                 {
                                                                     InceptionDate = ri.Min()
                                                                 })
                        where r.PolicyId == insuredRisk.PolicyId &&
                              r.VersionNo == insuredRisk.PolicyVersionNo &&
                              r.InsuredRiskNo == insuredRisk.InsuredRiskNo
                        select new InsuredRiskData
                                   {
                                       BasicInsuredRiskData = new BasicInsuredRiskData
                                                                  {
                                                                      InsuredRisk = r,
                                                                      InceptionDate = ri.InceptionDate
                                                                  }
                                   };

After translation to SQL, it generated for that InceptionDate column nice subquery in select clause with MIN statement, just like one would do when writing SQL directly..

@urogdiaron
Copy link

urogdiaron commented Dec 13, 2018

I just ran into this problem, and read that in EF Core 3.0 this might get addressed but seems hard.
I believe my case is much simpler than the ones above so you might want to take a look, hopefully this one is easier to fix.

This generates N + 1 queries

from item in Items
select new
{
   Owned = OwnedItems.Any(own => own.UserId == userId && own.ItemId == item.Id),
   Item = item
}

This one generates only 1 query

from item in Items
select new
{
   Owned = OwnedItems.Any(own => own.UserId == userId && own.ItemId == item.Id),
   Item = new Item { Id = item.Id, Name = item.Name, ... Every single property listed one by one... }
}

I understand that the first version needs to use the 'materialization code path' so it's different, but i can manually materialize the item by typing a lot.

Some helpful people lead me here from
https://stackoverflow.com/questions/53746633/entity-framework-core-linq-trouble-creating-select-case-exists-queries

@mevansdev
Copy link

Not sure if I have the same issue or if I'm doing something wrong, see code example below, might be isolated to the Max operator;

var query = DatabaseContext.Parents
    .Select(parent => 
        new ParentModel
        { 
            Name = parent.Name, 
            DobOfEldest = parent.Children.Max(child => child.DateOfBirth)
        });

This is generating multiple queries to get the Max(child.DateOfBirth)

@ghost
Copy link

ghost commented Apr 4, 2019

Another example from me.

Query definition:

post: A board post
post.User: Poster
UserPhoto: user has photos where only one of them is uniquely IsPrimary = true
UserLike: did the viewing user like this photo?
PostPhotos: Post might have multiple attached photos (like a Facebook post's photos)
.Select(post => new
{
	post,
	post.User,
	UserPhoto = post.User.Photos.Where(ph => ph.IsPrimary).Take(1).SingleOrDefault(),
	UserLike = post.Likes.Any(likes => likes.UserId == forUserId),
	PostPhotos = post.PostPhotos.Take(4).Select(pp => pp.Photo),
}).ToList();

Issue Explanation

Main query generated:

SELECT p..., "p.User"...
	FROM "Posts" AS p
	INNER JOIN "Users" AS "p.User" ON p."UserId" = "p.User"."Id"
	WHERE p."Id" = $1
	ORDER BY p."Id"
	LIMIT $2

So Entity Framework understands the user and fetches it directly in 1 query.
Every single other join is a separate query.
if we want to get 10 posts, 1 query for all the posts and users, 10 * 3 = 30 for the joins, our infamous N+1 problem here. (I also tested if it is 10 posts are we having only 1+3=4 queries in total or 1 + 3 * 10, and unfortunately it is the latter.)

Expected:

EF Core could have joined UserPhotos and UserLikes by understanding the relation is a one-to-zero or one so they can be achieved with a LEFT JOIN each.

post.User: one-to-one defined in the entity
UserPhoto: one-to-zero or one could be inferred from query
UserLike: one-to-zero or one could be inferred from query
PostPhotos: one-to-many

PostPhotos is a one-to-many so having separate queries for that is okay, though I do think with some aggregation functions it might be possible to also get them but I do understand the complexity and vendor difference.

Further technical details

EF Core: 3.0.0-preview3.19153.1
Provider: Npgsql 3.0.0-preview3
OS: (Development machine, MacOS)
IDE: Rider 2019.1

But I do want to emphasize that the join functionality actually works, what's missing is understanding a join in the query is one to zero or one at runtime. If it is defined in the entity the results are expected.

So I can actually provide the Primary User Photo Id in the entity and refer to the Photo table as one to one for that property, that should work, but that would mean changing the database design to make EF work which would result in redundant data that needs to be kept updated.

@smitpatel
Copy link
Member

This has been done.

@smitpatel smitpatel modified the milestones: 3.0.0, 3.0.0-preview6 Jun 5, 2019
@smitpatel smitpatel added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Jun 5, 2019
@ajcvickers ajcvickers modified the milestones: 3.0.0-preview6, 3.0.0 Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
Development

No branches or pull requests

7 participants