Metrics double_registration (storage_log_written_bytes) #7983

BenPope · 2022-12-30T15:44:32Z

FAIL test: RandomNodeOperationsTest.test_node_operations.enable_failures=True (3/153 runs)
failure at 2022-12-24T07:32:38.127Z: <BadLogLines nodes=docker-rp-12(1) example="ERROR 2022-12-24 07:14:49,202 [shard 0] cluster - controller_backend.cc:722 - [{kafka/topic-abletlvgrs/12}] exception while executing partition operation: {type: update, ntp: {kafka/topic-abletlvgrs/12}, offset: 476, new_assignment: { id: 12, group_id: 127, replicas: {{node_id: 1, shard: 0}, {node_id: 2, shard: 1}, {node_id: 4, shard: 1}} }, previous_replica_set: {{{node_id: 5, shard: 0}, {node_id: 2, shard: 1}, {node_id: 4, shard: 1}}}} - seastar::metrics::double_registration (registering metrics twice for metrics: storage_log_written_bytes)">
on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/20343#018542d1-a236-4ea4-aba2-6f4e33c128ea

jcsp · 2023-01-03T18:14:50Z

Without having inspected the logs, this is probably a case of quickly deleting then recreating the same NTP, such that the new storage log is getting created before the old one is destroyed.

0xdiba · 2023-01-04T10:37:02Z

If it helps in any way, we've seen this happen in the wild with other metrics too
eg: [shard 0] seastar - Exceptional future ignored: seastar::metrics::double_registration (registering metrics twice for metrics: kafka_consumer_group_consumers)

BenPope · 2023-01-04T11:06:19Z

This is related: #5939

mmaslankaprv · 2023-01-24T09:38:01Z

it looks like the partition shared pointer is being held alive by fetch request handler. Even tho the partition is removed and all the subsequent reads will fail, keeping pointer alive prevents metrics from being deleted. Maybe we we should explicitly deregister ntp metrics in disk_log_impl::remove() ?

piyushredpanda · 2023-01-24T13:18:41Z

Is that a question for @jcsp?

mmaslankaprv · 2023-01-26T16:01:37Z

@jcsp what do you think ?

BenPope added kind/bug Something isn't working ci-failure area/metrics labels Dec 30, 2022

dotnwat added area/storage sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low. and removed area/metrics labels Dec 30, 2022

BenPope added the area/metrics label Jan 4, 2023

dotnwat removed the area/metrics label Jan 5, 2023

michael-redpanda self-assigned this Jan 30, 2023

mmaslankaprv assigned mmaslankaprv and unassigned michael-redpanda Feb 1, 2023

mmaslankaprv mentioned this issue Feb 1, 2023

Explicitly deregister log metrics #8548

Merged

6 tasks

jcsp closed this as completed in #8548 Feb 1, 2023

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics double_registration (storage_log_written_bytes) #7983

Metrics double_registration (storage_log_written_bytes) #7983

BenPope commented Dec 30, 2022

jcsp commented Jan 3, 2023

0xdiba commented Jan 4, 2023

BenPope commented Jan 4, 2023

mmaslankaprv commented Jan 24, 2023

piyushredpanda commented Jan 24, 2023

mmaslankaprv commented Jan 26, 2023

Metrics double_registration (storage_log_written_bytes) #7983

Metrics double_registration (storage_log_written_bytes) #7983

Comments

BenPope commented Dec 30, 2022

jcsp commented Jan 3, 2023

0xdiba commented Jan 4, 2023

BenPope commented Jan 4, 2023

mmaslankaprv commented Jan 24, 2023

piyushredpanda commented Jan 24, 2023

mmaslankaprv commented Jan 26, 2023