Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed DNS SRV record lookup errors on querier container #1248

Closed
trexx opened this issue Jan 25, 2022 · 11 comments
Closed

failed DNS SRV record lookup errors on querier container #1248

trexx opened this issue Jan 25, 2022 · 11 comments
Milestone

Comments

@trexx
Copy link

trexx commented Jan 25, 2022

Describe the bug
Getting DNS errors in the querier container.
caller=dns_resolver.go:209 msg="failed DNS SRV record lookup" err="lookup _grpclb._tcp.grafana-tempo-tempo-distributed-query-frontend-discovery on 10.161.128.10:53: no such host"

These errors started upon upgrading to 1.3.0

To Reproduce
Deployed tempo-distributed 0.15.0 helm chart.
View logs of querier container.

Expected behavior
No DNS errors.

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: Helm

Additional Context
The frontend-discovery service has no such grpclb port defined in the helm chart.
I have verified the rendered configuration to be:

    querier:
      frontend_worker:
        frontend_address: grafana-tempo-tempo-distributed-query-frontend-discovery:9095

I have seen references to grpclb in loki helm charts listening on port 9096 however I see no such references in the tempo-distributed helm charts.

Related:
grafana/helm-charts#803
grafana/helm-charts#801

I am not sure if this is a helm chart issue or an issue with tempo.

@joe-elliott
Copy link
Member

We are seeing this error logged as well in our internal cluster which is deployed with jsonnet. Does querying work for you?

Querying works in our clusters and I smoke tested the helm chart as well to confirm basic read/writing worked. I will slot this for 1.4.

Hopefully it's just cleaning up an innocuous error log?

@joe-elliott joe-elliott added this to the v1.4 milestone Jan 25, 2022
@trexx
Copy link
Author

trexx commented Jan 26, 2022

Yes querying works, and for the most part it seems everything is working fine.

@joe-elliott
Copy link
Member

Yes querying works, and for the most part it seems everything is working fine.

Thanks for the issue. We will look at cleaning up the log when we can 👍

@annanay25
Copy link
Contributor

I had made a note of this somewhere but I guess I lost it. This might be related: grafana/dskit#102

@kasunsjc
Copy link

kasunsjc commented Jan 27, 2022

Having the same issue when using tempo with microservice helm deployment

I see the following similar errors as well from injector and distributers

caller=memberlist_logger.go:74 level=warn msg="Failed to resolve tempo-tempo-distributed-gossip-ring: lookup tempo-tempo-distributed-gossip-ring on 10.0.0.10:53: no such host"

@ghost
Copy link

ghost commented Jan 29, 2022

Having the same issue when using tempo with microservice helm deployment

I see the following similar errors as well from injector and distributers

level=info ts=2022-01-29T11:44:31.027387198Z caller=dns_resolver.go:209 msg="failed DNS SRV record lookup" err="lookup _grpclb._tcp.tempo-tempo-distributed-query-frontend on 10.18.128.10:53: no such host"

@banschikovde
Copy link

Hello! I encountered a similar error when upgrading to tempo 1.3 from the helm-chart tempo-distributed

level=info ts=2022-01-31T07:34:32.789371071Z caller=dns_resolver.go:209 msg="failed DNS SRV record lookup" err="lookup _grpclb._tcp.tempo-tempo-distributed-query-frontend-discovery on 10.210.0.10:53: no such host"

@sherifkayad
Copy link

Same here! .. Any plans when that log can be cleaned up / what the clear reason of the error is?

@joe-elliott
Copy link
Member

This should be removed in 1.4. At the moment we are unsure of the cause. See @annanay25 's comment above on the change that may have started the issue.

@joe-elliott
Copy link
Member

We have not seen this in our clusters fora few weeks. It appears to have been fixed in the r31 release when we upgraded dskit.

https://github.com/grafana/tempo/compare/r30..r31

image

@tayfourius
Copy link

tayfourius commented Mar 10, 2022

I fixed the issue with adding the port

    - name: grpclb
      port: 9096
      targetPort: grpc
      protocol: TCP

to tempo-distributed-query-frontend service and tempo-distributed-query-frontend-discovery service

apiVersion: v1
kind: Service
metadata:
  name: tempo-distributed-query-frontend-discovery
  namespace: logging
  labels:
    helm.sh/chart: tempo-distributed-0.16.4
    app.kubernetes.io/name: tempo-distributed
    app.kubernetes.io/instance: tempo-distributed
    app.kubernetes.io/version: "1.3.2"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: query-frontend
spec:
  type: ClusterIP
  clusterIP: None
  ports:
    - name: http
      port: 3100
      targetPort: 3100
    - name: grpc
      port: 9095
      protocol: TCP
      targetPort: 9095
    - name: tempo-query-jaeger-ui
      port: 16686
      targetPort: 16686
    - name: tempo-query-metrics
      port: 16687
      targetPort: jaeger-metrics
    - name: grpclb
      port: 9096
      targetPort: grpc
      protocol: TCP  
  selector:
    app.kubernetes.io/name: tempo-distributed
    app.kubernetes.io/instance: tempo-distributed
    app.kubernetes.io/component: query-frontend

emalihin added a commit to emalihin/helm-charts that referenced this issue Mar 24, 2022
Add missing grpclb port to query-frontend discovery service

Without it errors as such are logged by querier:
```
.. caller=dns_resolver.go:209 msg="failed DNS SRV record lookup" err="lookup _grpclb._tcp.tempo-distributed-platform-query-frontend-discovery on 10.252.0.10:53: no such host"
```

Same as in this issue grafana/tempo#1248
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants