Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupBy Children - N+1 DB trips #10472

Closed
mudrz opened this issue Dec 4, 2017 · 3 comments
Closed

GroupBy Children - N+1 DB trips #10472

mudrz opened this issue Dec 4, 2017 · 3 comments

Comments

@mudrz
Copy link

mudrz commented Dec 4, 2017

I couldn't find such existing issues or information anywhere

Further technical details

EF Core version: 2.0.0
Database Provider: SQLite (same with Npgsql.EntityFrameworkCore.PostgreSQL)
Operating system: MacOS High Sierra 10.13.1
IDE: Visual Studio Code 1.18.1

Steps to reproduce

A complete solution with the test scenario https://github.com/mudrz/TestsMaxGroupByEF

If we query children and group - they get pulled in memory and grouped - expected
If we query parents and group children - we get N+1 trips to DB

Models

    public class Parent
    {
        public int Id { get; set; }
        public virtual ICollection<Child> Children { get; set; }
    }
    public class Child
    {
        public int Id { get; set; }
        public int Group { get; set; }
        public int ParentId { get; set; }
        public virtual Parent Parent { get; set; }
    }

Views

    public class FooVm
    {
        public int? ParentId { get; set; }
        public IEnumerable<BarVm> Bars { get; set; }
    }
    public class BarVm
    {
        public int Group { get; set; }
        public int GroupCount { get; set; }
    }

Query the children

This one works as expected

  • we query the children and group them
  • only column Group gets pulled in memory and locally it is grouped.
curl localhost:5005/api/group/6
_context.Children
        .Select(c => c.Group)
        .GroupBy(g => g)
        .Select(g => new BarVm
        {
            Group = g.Key,
            GroupCount = g.Count()
        })

Query the parents

This one though

  • we query the parents and group the children
  • we get a separate (N+1) query to the DB for each parent, as if we looped through all parents
curl localhost:5005/api/group/2
_context.Parents
        .Select(p => new FooVm
            {
                ParentId = p.Id,
                Bars = p.Children
                        .Select(c => c.Group)
                        .GroupBy(g => g)
                        .Select(g => new BarVm
                        {
                            Group = g.Key,
                            GroupCount = g.Count()
                        }),
                ChildrenCount = p.Children.Count,
            })

Double selecting still produced the same result

_context.Parents
        .Select(p => new
            {
                Groups = p.Children.Select(c => c.Group),
                Parent = p
            })
        .Select(s => new FooVm
            {
                ParentId = s.Parent.Id,
                Bars = s.Groups
                        .GroupBy(g => g)
                        .Select(g => new BarVm
                        {
                            Group = g.Key,
                        }),
            })

Questions

  • Is this the expected result - multiple queries?
  • Is there a way to force select all groups and then group them in memory?
  • EF 2.1 will address the in-memory grouping, but will it allow the illustrated example here - grouping of children without multiple queries?
  • Is there any other workaround to avoid multiple DB trips, that I am missing (FromSql is not a solution, since the real-world scenario pulls various different expressions, merges them dynamically and produces the query)
@ajcvickers ajcvickers added this to the 2.1.0 milestone Dec 4, 2017
@ajcvickers ajcvickers modified the milestones: 2.1.0-preview1, 2.1.0 Jan 17, 2018
@maumar maumar removed their assignment Mar 7, 2018
@maumar maumar removed this from the 2.1.0 milestone Mar 7, 2018
@ajcvickers ajcvickers added this to the Backlog milestone Mar 7, 2018
@maumar
Copy link
Contributor

maumar commented Mar 7, 2018

Multiple queries is the expected behavior currently and this specific case won't be improved in 2.1 release. We can translate subqueries that result in just one element (e.g. p.Children.Count). If subquery produces a collection we often can translate it into two queries (see #9282), however optimization has a number of limitations, including GroupBy result operator. In the future release we might improve the translation.

You can try performing GroupBy operation on the client, which might be faster than N+1, depending on the data.

ctx.Parents
	.Select(p => new
	{
		Groups = p.Children.Select(c => c.Group).ToList(),
		ChildrenCount = p.Children.Count,
		ParentId = p.Id,
	})
	.ToList()
	.Select(s => new FooVm
	{
		Bars = s.Groups
			.GroupBy(g => g)
			.Select(g => new BarVm
			{
				Group = g.Key,
			}),
	}).ToList();

produces the following two queries that are later grouped on the client:

SELECT [p].[Id] AS [ParentId], (
	SELECT COUNT(*)
	FROM [Children] AS [c]
	WHERE [p].[Id] = [c].[ParentId]
) AS [ChildrenCount]
FROM [Parents] AS [p]
ORDER BY [p].[Id]

SELECT [t].[Id], [p.Children].[Group], [p.Children].[ParentId]
FROM [Children] AS [p.Children]
INNER JOIN (
	SELECT [p0].[Id]
	FROM [Parents] AS [p0]
) AS [t] ON [p.Children].[ParentId] = [t].[Id]
ORDER BY [t].[Id]

@maumar
Copy link
Contributor

maumar commented Sep 29, 2020

this scenario is not supported now due to #15873- we need key of the child to know how to properly bucket the results in the correlated collection scenario. Due to Distinct/GroupBy semantics we can't add these necessary columns to the projection, so the scenario is blocked. If the user manually provides those necessary columns (by grouping by them) we still hit the problem tracked by #21965 (we try to add parent side of the correlation predicate into the inner projection, which causes an error because now we project a column that is not part of the grouping key or aggregate)

@maumar maumar removed the verify-fixed This issue is likely fixed in new query pipeline. label Sep 29, 2020
@maumar
Copy link
Contributor

maumar commented Sep 29, 2020

closing as duplicate

@maumar maumar closed this as completed Sep 29, 2020
@maumar maumar removed this from the MQ milestone Sep 29, 2020
@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants