Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add middleware layer. #1841

Closed
wants to merge 12 commits into from

Conversation

al45tair
Copy link
Contributor

Added a Middleware layer that can be used to alter the results of authentication
by a connector.

Also added storage support for the new Middleware layer.

Signed-off-by: Alastair Houghton alastair@alastairs-place.net
Issues: #1635

@al45tair
Copy link
Contributor Author

OK, @sagikazarmark, I think this is ready for a review now.

It has a "groups" middleware that can manipulate group names in various ways, a "claims" middleware for static manipulation of custom claims, and a "grpc" middleware for external things to plug in to (for instance, to add/remove custom claims on the basis of their own data).

@sagikazarmark
Copy link
Member

Thanks for working on this @al45tair. I'll try to review it soon, but it's quite a huge change. :)

@sagikazarmark
Copy link
Member

sagikazarmark commented Nov 24, 2020

@al45tair I briefly reviewed your PR and I think it's really great. Top quality! Thanks a lot! ❤️

I really like the concept as well, it's consistent and easy to understand. However, I had a slightly different concept in mind and before reviewing the PR in more details or committing to this approach, I'd like to run it by you.

My initial idea was something like this:

middleware:
  - filterGroups:
      - groupA
      - groupB
  - addClaims:
      key: value
      key2: value2
  - allowGroups:
      - groupC
  - denyGroups:
      - groupD

At first, it might seem to be a config difference only (and this style is probably also slightly harder to implement). But overall, the underlying middleware implementation would become slightly simpler. Interestingly enough, the current Middleware interface already allows this configuration style.

The above configuration in the current style would look like this:

middleware:
  - type: groups
    config:
      actions:
        - filter: "groupA"
        - filter: "groupB"

  - type: claims
    config:
      inject:
        key: value
        key2: value2

  - type: groups
    config:
      actions:
        - allow: "groupc"
        - deny: "groupD"

While this is probably somewhat easier to implement, I think it's slightly less readable and obvious than the above example.

Implementing my version in config is probably slightly harder though. It requires weird pointer hacks, but I think it's doable:

type MiddlewareConfig struct {
	FilterGroups *[]string               `yaml:"filterGroups"`
	AddClaims    *map[string]interface{} `yaml:"addClaims"`
	AllowGroups  *[]string               `yaml:"allowGroups"`
	DenyGroups   *[]string               `yaml:"denyGroups"`
}

We can probably tweak this configuration further, by adding a type identifier:

middleware:
  - type: filterGroups
    config:
      - "groupA"
      - "groupB"

# ...

tl;dr: instead of categorizing middleware, in this scenario each would be an indepentent implementation.

What do you think? I'm not 100% convinced that my approach is better, but it seems to be easier to read (in config) and more separated (in implementation).

@sagikazarmark
Copy link
Member

@al45tair happy new year! I'd love to hear your thoughts about the above ☝️

@al45tair
Copy link
Contributor Author

al45tair commented Jan 2, 2021

@sagikazarmark Happy New Year!

I've been thinking about this a lot. The pointer hack thing is fine, I think, if we don't intend to add too many middleware things to Dex itself. If we add a lot, we'll end up with a lot of "if" statements to unpack the data, plus it makes the MiddlewareConfig struct large. I've done something very similar for the "actions" in some of the middleware modules I've written.

The "type" field version has the advantage that the config itself is a separately defined struct, so we aren't wasting so much space and it's easier to modularise things, but it does come at the cost of some code complexity and you're right that it makes the configuration YAML less readable.

The other option worth mentioning is that we could parse the middleware YAML as a map[string]interface{}, rather than using a struct, then we don't need the separate "type" field (we can just use the string key as the type). That might be a better way to get the YAML looking the way we'd like while also making things modular.

In that case, I think what you're then suggesting is that the modules should be smaller, so rather than a "groups" module, you'd have a "filterGroups" module, an "allowGroups" module and so on. That's doable, I think.

I'll have to scrounge up the time to work on this some more. I've got quite a few things I need to do in the immediate future (not least, I'm going to need SPNEGO support in Dex, I think).

@sagikazarmark
Copy link
Member

@al45tair did you have time to check on this?

@candlerb
Copy link
Contributor

I think the idea is great in principle, but I have a few questions/thoughts.

(1) I couldn't see a full example, but looking at the code I believe there is a separate middleware chain configured per connector. For example, using static config, I think you would write something like:

connectors:
- type: google
  id: google
  name: Google
  config:
    issuer: https://accounts.google.com
    ... blah
  middlewares:
  - type: claims
    config:
      actions:
        - discard: "^foo\.example\.com/.*$"

Is that correct?

(2) I would like to be able to add custom attributes when generating JWTs for a particular Dex client (audience).

Is that possible in this design? Is the middleware chain traversed when generating JWTs, or only when receiving one from the connector?

If not, then perhaps a similar middleware chain could be configured under the client.

(3) In most cases I'll want to do conditional logic, e.g. "add claim X: Y if user is a member of group Z".

As far as I can see, to do this you need to write an external gRPC service. I don't really have a problem with this, and it avoids getting caught by Greenspun's Tenth Rule.

However I think being able to qualify each rule with a simple precondition might avoid the need for this in many cases:

  • check if a given claim is present, with a particular value, or matching a regex
  • check if a given group membership is present

Typical use: in an Azure AD connector, if in group "uuid-foo-bar-baz", then add group "admins"

This could be as simple as extending the rule type+config to preconditions+type+config.

(4) I would really like the option for middlewares to be database-backed, specifically so I can add users to groups without having to restart the server.

Obviously, an external gRPC service can have its own database. But even then, it would be nice to be able to use the Dex database, to avoid a separate DB instance, and to enable it to be managed via Dex's gRPC admin interface.

I can understand that you're keen to avoid extending the storage backend API, but I wonder if a simple K/V table which middlewares can use would be a good idea? (But in any case this can be added later)

@sagikazarmark
Copy link
Member

(1) I couldn't see a full example, but looking at the code I believe there is a separate middleware chain configured per connector.

Yes, the current proposal works on per connector basis.

(2) I would like to be able to add custom attributes when generating JWTs for a particular Dex client (audience).

The current proposal targets connectors at the moment. Frankly, we didn't have many client-focused use cases for configuration so far.

(3) In most cases I'll want to do conditional logic, e.g. "add claim X: Y if user is a member of group Z".

I think there are simple conditional features in the current proposal, but something like gRPC or an embedded solution (lua, rego) would be better for that use case.

(4) I would really like the option for middlewares to be database-backed, specifically so I can add users to groups without having to restart the server.

This isn't something that we are considering right now, but for the long term we have plans to extend configuration capabilities (eg. use CRDs to configure connectors?)

@candlerb
Copy link
Contributor

candlerb commented Feb 11, 2021

I've tried running the WorldProgrammingLtd/middleware branch now. (Aside: it could do with rebase: "178 commits behind dexidp:master").

I am confused about configuration. Clearly static middleware is a top level construct:

        // StaticConnectors are user defined connectors specified in the ConfigMap
        // Write operations, like updating a connector, will fail.
        StaticConnectors []Connector `json:"connectors"`

        // StaticMiddleware are global middleware specified in the ConfigMap.
        StaticMiddleware []Middleware `json:"middleware"`

but it is also specified under the connector:

// Connector is a magical type that can unmarshal YAML dynamically. The
// Type field determines the connector type, which is then customized for Config.
type Connector struct {
        Type string `json:"type"`
        Name string `json:"name"`
        ID   string `json:"id"`

        Config server.ConnectorConfig `json:"config"`

        Middleware []Middleware `json:"middleware"`
}

So when I'm using a statically-configured connector, it's unclear where the middleware goes - and if StaticMiddleware is configured, when it's run (for all connectors?? Adding a static claim to all users from all connectors seems a not very useful thing to do). And if you configure both, in what order are they run?

When testing this, at first I was caught out by a usability problem: I initially put middlewares: instead of middleware:, and the config was silently accepted with no error, even though it didn't do anything. I think it should reject unknown YAML keys. (I have a solution for that somewhere if you like, but it uses the yaml v3 library)

Anyway, in the end I got it to work with the following config:

connectors:
- type: mockCallback
  id: mock
  name: Example
  middleware:
  - type: claims
    config:
      inject:
        "example.com/foo": bar
  - type: groups
    config:
      inject:
        - DummyGroup

middleware:
- type: claims
  config:
    inject:
      "example.com/foo": baz
- type: groups
  config:
    inject:
      - BlankGroup

And the result (with the example app, and requesting groups scope) is:

{
  "at_hash": "4wuKZcTY38WVNwSp-P3QSQ",
  "aud": "example-app",
  "email": "kilgore@kilgore.trout",
  "email_verified": true,
  "example.com/foo": "baz",
  "exp": 1613144004,
  "groups": [
    "authors",
    "DummyGroup",
    "BlankGroup"
  ],
  "iat": 1613057604,
  "iss": "https://dex1.home.deploy2.net:5554/dex",
  "name": "Kilgore Trout",
  "nonce": "",
  "preferred_username": "",
  "sub": "Cg0wLTM4NS0yODA4OS0wEgRtb2Nr"
}

That's definitely a good start. (I notice that all custom claims are provided whether you ask for them or not; and if I request scope example.com/foo it tells me that's an invalid scope. But I have no problem with the current behaviour).

Also, I think this demonstrates that the global middleware is run after the per-connector middleware.


The current proposal targets connectors at the moment. Frankly, we didn't have many client-focused use cases for configuration so far.

I gave one here: application PostgREST requires a custom "role" claim for users authorized to connect to the database. It would be safer to present this only to the application which needs it, rather than give it to every other application as well.

Another example might be where several applications require group: ["admins"], but I want different groups of people to have admin rights in different applications.

(3) In most cases I'll want to do conditional logic, e.g. "add claim X: Y if user is a member of group Z".

I think there are simple conditional features in the current proposal, but something like gRPC or an embedded solution (lua, rego) would be better for that use case.

In the current branch, I can't see any conditions along the lines of "only do this for <sub X> or <group Y>"

Here's a suggestion. Extend the Middleware to add some conditions:

type Middleware struct {
        Type      string            `json:"type"`
        Subs      []string          `json:"subs"`
        Groups    []string          `json:"groups"`
        AllGroups []string          `json:"all_groups"`
        Claims    map[string]string `json:"claims"`

        Config server.MiddlewareConfig `json:"config"`
}

You can provide zero or more conditions, and for every one which is present, it must be true.

subs: [foo, bar, baz]   # sub must be any one of these
groups: [foo, bar]      # foo OR bar must be in the list of groups
all_groups: [foo, bar]  # foo AND bar must be in the list of groups
claims:
  email: a@example.com  # claim must exist with this value

I expect people would eventually want others (e.g. claim matching regexp or list of values), but to be honest you could drop the 'claims' one for now. Just having 'subs' and 'groups' would be sufficient for me to build groups, which is what I actually want. (Specific use case: build a group out of generic Google users, when not using a Google Docs domain)

One point about matching on "sub". As long as the middleware chain is running under a particular connector, then there is no ambiguity over the "sub" value. However if there is a global middleware chain as well, then the "sub" value potentially needs to be qualified by the connector, as in theory two different connectors could return the same sub value. In this case, probably it should use dex's own synthesised "sub" claim, rather than the "sub" claim provided by the connector.

@candlerb
Copy link
Contributor

With gopkg.in/yaml.v3, the trick to failing on unknown fields is:

import yaml "gopkg.in/yaml.v3"
...
        dec := yaml.NewDecoder(some_io_reader)
        dec.KnownFields(true)
        err = dec.Decode(&s)
        if err != nil { ... }

But it looks like dex uses https://github.com/ghodss/yaml instead.

@candlerb
Copy link
Contributor

Looking at the grpc protocol:

// The identity information being processed.
message Identity {
  string user_id = 1;
  string username = 2;
  string preferred_username = 3;
  string email = 4;
  bool email_verified = 5;
  repeated string groups = 6;

  // Custom claims are sent as a JSON blob
  bytes custom_claims = 7;
}
  1. I think the connector_id ought to be provided as a field (also the client_id, if client-specific middleware ever gets implemented). Otherwise, clashing user_ids from different connectors can't be distinguished. connector_id and user_id together are what you get from federated_claims.
  2. From the protocol it was unclear to me what the distinction between user_id and username is, in terms of what openidc claims they represent. Digging in code, these come from similarly-named fields in Dex's main connector.Identity. Are they sub and name respectively? If so, a comment would be helpful (since someone might be implementing this gRPC spec, without being familiar with the internal data structures of Dex itself)
  3. This patch introduces custom_claims. I guess that standard claims from the profile scope, such as "given_name" and "family_name", would be considered as "custom_claims", is that right?

Added a Middleware layer that can be used to alter the results of authentication
by a connector.

Also added storage support for the new Middleware layer.

Signed-off-by: Alastair Houghton <alastair@alastairs-place.net>
Issues: dexidp#1635
Adds support for custom claims to the storage layer.  This will be used in a
subsequent commit to allow manipulation of claims from middleware.

Signed-off-by: Alastair Houghton <alastair@alastairs-place.net>
Added support for custom claims, and a simple middleware to allow the
injection of static claims, which is useful for testing.

Signed-off-by: Alastair Houghton <alastair@alastairs-place.net>
This should have been a Fatalf() call.

Signed-off-by: Alastair Houghton <alastair@alastairs-place.net>
This is a middleware implementation that lets you implement your middleware
outside of Dex itself, in a separate gRPC server.

Signed-off-by: Alastair Houghton <alastair@alastairs-place.net>
Claims and grpc weren't in the middleware list in server/middleware.go.

Signed-off-by: Alastair Houghton <alastair@alastairs-place.net>
Apparently I missed off the code for the inject feature in the claims
middleware.  Also added a test for it.

Signed-off-by: Alastair Houghton <alastair@alastairs-place.net>
There's no reason to pass connector data outside Dex, so we don't send
it over gRPC.  Unfortunately, we also weren't preserving it, which
meant that refreshing didn't work if the gRPC middleware was being
used.  Fix this and add an explicit test for it.

Signed-off-by: Alastair Houghton <alastair@alastairs-place.net>
Upstream dex has updated golangci-lint with new linters.

Signed-off-by: Alastair Houghton <alastair@alastairs-place.net>
The linter was updated.  Fix the things it complained about.

Signed-off-by: Alastair Houghton <alastair@alastairs-place.net>
ClaimTokHash doesn't exist any more because that isn't how things work now.

Signed-off-by: Alastair Houghton <alastair@alastairs-place.net>
The linter is really very good :-)

Signed-off-by: Alastair Houghton <alastair@alastairs-place.net>
@sagikazarmark
Copy link
Member

Further thinking about this feature: I'm not sure middleware should receive an Identity, but rather all claims. That way we can implement claim mapping as generic middlware as well.

I'd really like to keep the custom claims discussion separate from this feature. I understand the value of it, but it's not a high priority right now and with middleware accepting claims directly, it's not a requirement either.

Personally, I think this PR grew a bit large. I'll try to take a stab at an alternative, smaller implementation (no grpc, no static middleware, etc) and see what happens. I don't mind spending a little bit more effort on finding the ideal solution.

@al45tair
Copy link
Contributor Author

I'm going to close this PR as I don't think it's likely to get merged. The code will still be available in the upstream repo.

@al45tair al45tair closed this Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants