Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue configuring GRPC TLS #6580

Open
nbjohnson opened this issue Nov 6, 2023 · 5 comments
Open

Issue configuring GRPC TLS #6580

nbjohnson opened this issue Nov 6, 2023 · 5 comments

Comments

@nbjohnson
Copy link

nbjohnson commented Nov 6, 2023

Describe the bug

I have been attempting to follow the GRPC TLS guide: https://grafana.com/docs/mimir/latest/manage/secure/securing-communications-with-tls/

However when setting clientAuthType: RequireAndVerifyClientCert I am getting the following error:

error reading server preface: remote error: tls: certificate required

It seems like some of the GRPC clients aren't presenting the certs they are provided for connections. Not sure if this is possibly me misunderstanding the configuration or if there is a bug where some clients are not making calls with the cert.

The connections that seem to be failing are:

distributor -> ingester
querier -> query scheduler
query frontend -> query scheduler

To Reproduce

Steps to reproduce the behavior:

I have tried both the latest released version: helm 5.1.2 and mimir 2.10.3, as well as the latest beta: helm 5.2.0-weekly.260 and mimir r262-fa3a8df

It also doesn't feel like the issue is me not providing the cert files, if I have forgot to mount the cert files into a pod that needs them it usually complains about not being able to find the cert file.

These are the grpc tls configs I have set in my helm config:

mimir:
  structuredConfig:
    server:
      grpc_tls_config:
        ...
    ingester_client:
      grpc_client_config:
        ...
    frontend_worker:
      grpc_client_config:
        ...
      query_scheduler_grpc_client_config: (newer beta only)
        ...
    frontend:
      grpc_client_config:
        ...
    querier:
      store_gateway_client:
        ...
    query_scheduler:
      grpc_client_config:
        ...

Expected behavior

All mimir components should be able to talk to one another without issue with GRPC TLS enabled with the most strict requirement, RequireAndVerifyClientCert

Environment

  • Infrastructure: Kubernetes
  • Deployment tool: helm

Additional Context

I know there was a code change this morning that updated the grpc tls: #6573 however it seems like this wouldn't be my issue, would expect to see a name error of some sort I think if this was the error I was currently running into.

@fayzal-g
Copy link
Contributor

fayzal-g commented Nov 9, 2023

Hey @nbjohnson - I'm unable to replicate this on my side.

Let's focus on one of the connections, namely querier -> query-scheduler

This error looks like the client certificate isn't present, but as you said I would also expect a different error. Couple questions:

  • To double check, what does your configmap look like: kubectl describe configmap xxx-mimir-config - specifically the frontend_worker block
  • Have you checked the querier pod to ensure the client cert/key specified in the above config is definitely actually mounted there?

@nbjohnson
Copy link
Author

@fayzal-g Thanks for trying to help me debug this. So I checked the querier pod and it does have the expected certs mounted in them that the config is pointing to. If they weren't mounted I would have expected the pod to complain that it couldn't find the cert file my config points it to otherwise (which I have seen before with other pods).

As for the config, the mimir config file in the querier pod looks like:

frontend_worker:
  grpc_client_config:
    tls_ca_path:
    tls_cert_path:
    tls_enabled:
    tls_key_path:
    tls_min_version:
    tls_server_name:
  query_scheduler_grpc_client_config:
    tls_ca_path:
    tls_cert_path:
    tls_enabled:
    tls_key_path:
    tls_min_version:
    tls_server_name:

But yeah it looks like for some reason the pod isnt using a cert for the connection, not sure if it is a misconfig on my end or what

@fayzal-g
Copy link
Contributor

fayzal-g commented Nov 9, 2023

Looking at the error, it seems to be coming from here: https://github.com/golang/go/blob/master/src/crypto/tls/handshake_server.go#L878 - I have a feeling the client certificate is misconfigured.

Is the output as expected when you run openssl x509 -noout -text -in /path/to/your/client.crt?

@nbjohnson
Copy link
Author

nbjohnson commented Nov 9, 2023

@fayzal-g Hmm maybe my issue is I am trying to pass a cert chain as the ca file to validate the client certs against. Do we know if either the server.grpc_tls_config.client_ca_file or >component<.gprc_client_config.tls_ca_path support passing a chain file rather that just a single cert file?

@nbjohnson
Copy link
Author

What I want to do is essentially mutual client auth, not with a root CA. I want to give a chain list of the public certs of each of the components plus the server public cert for them to validate against, rather than all validating against the root they were signed from. Is it possible that the it is only pulling the first cert in this list of certs to use for validation (first would be the server public cert)? So connections to servers would validate on the client side since they need to check the server cert which is first, but the server side fails the check since the client cert it needs to validate is never first in the list? Not sure about this theory cause I would think I would see tls errors on more pods, and would hope there would be a more clear error message if it is a validation issue. I tested with using the root as the CA for both clients and server and all errors seemed to go away, so it must have something to do with trying to use mutual auth rather than with a root cert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants