Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with array_agg(column1, column2) in Entity Framework and Postgres? #2631

Closed
CarstenK1 opened this issue Jan 31, 2023 · 7 comments

Comments

@CarstenK1
Copy link

I've created a custom object to get the result of an Raw SQL. I used JSONB to aggregate some data. The object is defined like this:

modelBuilder.Entity<PermissionWithScopesByPropertyDto>(entity =>
            {
                entity.HasNoKey();
                entity.Property(e => e.Permissions)
                    .HasColumnType("jsonb")
                    .HasColumnName("permissions");
                
                entity.Property(e => e.ResourceId)
                    .HasColumnName("resource_id");

            });

This works good, but than I noticed that using array_agg is much faster and I changed the query to use that aggregation function like this:

Select array_agg((code, permission_scope_ids)) ..... that creates an Array of Tupels <string,int[]>. The return datatype is record[]. Now I try to match that to the Entity Framework. The NPGSQL support that Datatype. I've tried using object[] or record as Type, but does not work. Can entity Framework map that to an Array of Objects?

@roji
Copy link
Member

roji commented Jan 31, 2023

I'm not fully understanding what you're looking to do - can you please show the current C# code and query you're using, and the full query you'd like to see? It's difficult to understand from snippets and partial info exactly you're looking for.

@CarstenK1
Copy link
Author

CarstenK1 commented Jan 31, 2023

Well there is not much code to show. I wasn't able to create a really fast query with Entity Framework, since this would be to complex.

 var result = await _dbContext.Set<PermissionWithScopesByPropertyDto>().FromSqlRaw
        (
            "WITH resources AS (" +
                "Select array_agg(distinct iam_assignment.role_id)::bigint[] as roles, iam_resource.id " +
                "from iam_user " +
                "inner join iam_assignment " +
                "on iam_user.id = iam_assignment.user_id " +
                "inner join iam_role " +
                "on iam_role.id = iam_assignment.role_id and iam_role.level <= iam_user.level " +
                "inner join iam_resource " +
                "on iam_assignment.resource_id = Any(iam_resource.accessable_by) " +
                "where iam_assignment.user_id in (select unnest(accessable_by) as id from iam_user where external_id = {0} ) " +
                "group by iam_resource.id"+
            "), " +
            "permission_ids as (select distinct unnest(accessable_by) as id, iam_permission.code from iam_permission " +
            "where iam_permission.code = Any(Array[{1}]) group by id) " +

          
                "Select array_agg((permission_ids.code,iam_role_permission.permission_scope_ids)) as permissions, resources.id as resource_id " +
                "FROM iam_role " +
                "inner join iam_role_permission " +
                "on iam_role_permission.role_id = iam_role.id " +
                 "inner join permission_ids  " +
                "on iam_role_permission.permission_id = permission_ids.id " +
                "inner join resources " +
                "on iam_role.id = Any(roles)  " +
                "group by resource_id", userExternalId, permissions).ToArrayAsync(cancellationToken);
        return result.ToDictionary(kvp => kvp.ResourceId, kvp => kvp.Permissions);

To get a fast db query I decided split up the things using CTE. One issues is, that two tables have large number of rows. With a normal join it would work, but the result set would grow. So I decided to aggregate both parts by creating arrays and to join via Any() Operator . One side has a array of assigned roleIds which are matiching to the roleId of the table. And to have one row per resource I did a second aggregate on permissions and scopes.

image

The pro on this solution is, that number of rows and that means also number of objects to be mapped does not grow very fast with a larger DB. Without aggregate the result set would have in the worst case 500000 objects or more depending on the passed parameters.

@CarstenK1
Copy link
Author

CarstenK1 commented Jan 31, 2023

Well the only other solution what just come up in my mind is creating a user defined type. That should work with Entity Framework.

Would be cool if EF could deal with that aggregate objects. Just by adding a mapping to the model like this:

.HasColumnType("varchar, int[]")

or in case of an Array something like

.HasColumnType("(varchar, int[])[]")

@CarstenK1
Copy link
Author

Is there a way to map user defined types in EF7?

@roji
Copy link
Member

roji commented Feb 1, 2023

There's a lot going on in your comments, I'm still not sure I understand exactly what it is you're trying to do (the overall goal as opposed to "how I can do X specifically"). But here are some answers:

One issues is, that two tables have large number of rows. With a normal join it would work, but the result set would grow.

First, it seems you're trying to avoid using SQL JOIN because of the data duplication it creates (e.g. joining Blogs with Posts duplicated a Blog for each Post it has). I'd advise doing this only if you've confirmed that the duplication causes an actual performance problem in your specific scenario, rather than doing any sort of premature optimization. See the EF docs on this.

Assuming you do need to avoid JOIN, the usual EF way to do this is via split querying. I'd again advise considering this, and only looking into advanced PG-specific techniques if you actually need to.

I do aim to add the possibility to use array_agg to load collection includes on PG - that's #1691. But that issue hasn't been implemented yet.

I've created a custom object to get the result of an Raw SQL

You can include your custom type in your EF model (but e.g. exclude it from migrations to prevent EF from creating a table); at that point you can use SQL querying and map the results back to it (docs). This should allow you to write whatever SQL you want and have EF materialize the results back to your type.

Note that we've just merged an enhancement for EF 8.0 that allows you to do this without including your type in the EF model (docs).

Select array_agg((code, permission_scope_ids)) ..... that creates an Array of Tupels <string,int[]>. The return datatype is record[]. Now I try to match that to the Entity Framework. The NPGSQL support that Datatype. I've tried using object[] or record as Type, but does not work. Can entity Framework map that to an Array of Objects?

Version 7.0 of the provider does support translating array_agg, at least in some scenarios, e.g.:

_ = ctx.Posts
    .GroupBy(p => p.Blog)
    .Select(g => EF.Functions.ArrayAgg(g.Select(p => p.Id)))
    .ToArray();

However, you cannot currently do ArrayAgg over a record/tuple, I've opened #2633 to track this. You also can't translate unnest at this point.

To get a fast db query I decided split up the things using CTE.

I'm not sure how a CTE helps making the query faster; note that non-recursive CTEs can be rewritten as a simple subquery, so there's nothing you should be unable to do without them. Regardless, EF can't currently generate CTEs (WITH), this is tracked by dotnet/efcore#26486.

@CarstenK1
Copy link
Author

Thank you very much that sounds good. Not quite sure how split query is implemented. Does it execute each sub query, wait for the response and do the next one? One thought on using CTE was that I avoid data transfer between Client and DB.

@roji
Copy link
Member

roji commented Feb 1, 2023

@CarstenK1 yes, at this point split queries are executed in separate roundtrips, although this is something we want to fix soon (dotnet/efcore#10878). This situation indeed isn't ideal, but I'd make sure it's enough of a problem for you before doing something more complicated.

Note that again, non-recursive CTEs can generally be expressed as subqueries (not that there's anything wrong with using them, if you're writing your own SQL).

@roji roji closed this as not planned Won't fix, can't repro, duplicate, stale Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants