Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics double_registration (storage_log_written_bytes) #7983

Closed
BenPope opened this issue Dec 30, 2022 · 6 comments · Fixed by #8548
Closed

Metrics double_registration (storage_log_written_bytes) #7983

BenPope opened this issue Dec 30, 2022 · 6 comments · Fixed by #8548
Assignees
Labels
area/storage ci-failure kind/bug Something isn't working sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low.

Comments

@BenPope
Copy link
Member

BenPope commented Dec 30, 2022

FAIL test: RandomNodeOperationsTest.test_node_operations.enable_failures=True (3/153 runs)
failure at 2022-12-24T07:32:38.127Z: <BadLogLines nodes=docker-rp-12(1) example="ERROR 2022-12-24 07:14:49,202 [shard 0] cluster - controller_backend.cc:722 - [{kafka/topic-abletlvgrs/12}] exception while executing partition operation: {type: update, ntp: {kafka/topic-abletlvgrs/12}, offset: 476, new_assignment: { id: 12, group_id: 127, replicas: {{node_id: 1, shard: 0}, {node_id: 2, shard: 1}, {node_id: 4, shard: 1}} }, previous_replica_set: {{{node_id: 5, shard: 0}, {node_id: 2, shard: 1}, {node_id: 4, shard: 1}}}} - seastar::metrics::double_registration (registering metrics twice for metrics: storage_log_written_bytes)">
on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/20343#018542d1-a236-4ea4-aba2-6f4e33c128ea

@BenPope BenPope added kind/bug Something isn't working ci-failure area/metrics labels Dec 30, 2022
@dotnwat dotnwat added area/storage sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low. and removed area/metrics labels Dec 30, 2022
@jcsp
Copy link
Contributor

jcsp commented Jan 3, 2023

Without having inspected the logs, this is probably a case of quickly deleting then recreating the same NTP, such that the new storage log is getting created before the old one is destroyed.

@0xdiba
Copy link
Contributor

0xdiba commented Jan 4, 2023

If it helps in any way, we've seen this happen in the wild with other metrics too
eg: [shard 0] seastar - Exceptional future ignored: seastar::metrics::double_registration (registering metrics twice for metrics: kafka_consumer_group_consumers)

@BenPope
Copy link
Member Author

BenPope commented Jan 4, 2023

This is related: #5939

@mmaslankaprv
Copy link
Member

it looks like the partition shared pointer is being held alive by fetch request handler. Even tho the partition is removed and all the subsequent reads will fail, keeping pointer alive prevents metrics from being deleted. Maybe we we should explicitly deregister ntp metrics in disk_log_impl::remove() ?

@piyushredpanda
Copy link
Contributor

Is that a question for @jcsp?

@mmaslankaprv
Copy link
Member

@jcsp what do you think ?

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage ci-failure kind/bug Something isn't working sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants