Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport] [v23.1.x] More detailed partition reconfiguration tracking #10201 #10630

Merged
merged 7 commits into from
May 10, 2023

Conversation

bharathv
Copy link
Contributor

@bharathv bharathv commented May 9, 2023

Enriched /reconfiguration API with more information allowing users to track the progress of partition reconciliation. Now the API returns a complete set of information related with partition reconfiguration that is taking place.

The API will now return the following JSON:

{
    "ns": "kafka",
    "topic": "topic-khbikkrzeo",
    "partition": 9,
    "previous_replicas": [
        {
            "node_id": 2,
            "core": 0
        },
        {
            "node_id": 3,
            "core": 0
        },
        {
            "node_id": 1,
            "core": 0
        }
    ],
    "current_replicas": [
        {
            "node_id": 4,
            "core": 0
        },
        {
            "node_id": 3,
            "core": 0
        },
        {
            "node_id": 1,
            "core": 0
        }
    ],
    "bytes_left_to_move": 190,
    "bytes_moved": 0,
    "partition_size": 190,
    "reconciliation_statuses": [
        {
            "node_id": 2,
            "operations": [
                {
                    "type": "update",
                    "core": 0,
                    "retry_number": 7,
                    "revision": 89,
                    "status": "Generic failure occurred during partition operation execution (cluster::errc:52)"
                }
            ]
        },
        {
            "node_id": 1,
            "operations": [
                {
                    "type": "update",
                    "core": 0,
                    "retry_number": 3,
                    "revision": 89,
                    "status": "Current node is not a leader for partition (cluster::errc:17)"
                }
            ]
        },
        {
            "node_id": 4,
            "operations": [
                {
                    "type": "update",
                    "core": 0,
                    "retry_number": 5,
                    "revision": 89,
                    "status": "Current node is not a leader for partition (cluster::errc:17)"
                }
            ]
        },
        {
            "node_id": 3,
            "operations": [
                {
                    "type": "update",
                    "core": 0,
                    "retry_number": 5,
                    "revision": 89,
                    "status": "Current node is not a leader for partition (cluster::errc:17)"
                }
            ]
        }
    ]
}

Fixes #10434
Backport of #10201

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

@bharathv bharathv changed the title Backport 10201 [backport] [v23.1.x] More detailed partition reconfiguration tracking #10201 May 9, 2023
@piyushredpanda piyushredpanda added this to the v23.1.9 milestone May 9, 2023
@@ -273,6 +273,29 @@ inline bool has_non_replicable_op_type(const topic_table_delta& d) {
}
__builtin_unreachable();
}

inline std::vector<model::broker_shard> union_replica_sets(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually needed by the previous commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought no one would notice it in a backport :-P

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha 😁

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit cd45527)
In order to provide a generic error code to express errors originating
from outside of the cluster module (errors with different category) or
an exceptions occurred in `controller_backend` we introduce a separate
error code.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 711949d)
Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 8f83991)
Added revision, last error and retry count to backend operation. The
information will be used to track partition reconfiguration progress.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 8e01dd0)
mmaslankaprv and others added 3 commits May 9, 2023 19:37
Added `controller_api` that allows caller to request partition
reconciliation state from all the replicas where partition is currently
hosted. The API returns a data structure containing operations that are
executed by `controller_backend` on all of the replicas.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 1799693)
The `/reconfiguartions` endpoint didn't provide an insight into the
progress of partition reconfigurations.

Added information that will allow user to check the operation progress
and additionally check status of reconciliation on all replicas.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit b2467b3)
Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit d83a975)
@bharathv
Copy link
Contributor Author

Failure: #10219 (known issue, unrelated).

@bharathv bharathv merged commit c9364d8 into redpanda-data:v23.1.x May 10, 2023
@bharathv bharathv deleted the backport-10201 branch May 10, 2023 17:12
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants