[admin] Fix self leadership transfer #3446

VadimPlh · 2022-01-11T14:14:26Z

Cover letter

Problem

I got in raft_availability_test.RaftAvailabilityTest.test_leader_transfers_recovery.acks=-1 error:

Max retries exceeded with url: /v1/partitions/kafka/topic-xvvoavvzqf/0/transfer_leadership?target=1 (Caused by ResponseError('too many 503 error responses'))

@jcsp investigated to this error and found:

the new leader is node 1, the intended destination. So somehow, the leadership transfer is being done, but the client is receiving the wrong status code.

That old leader is stepping down while in the middle of doing the leadership transfer
`INFO 2022-01-10 11:47:00,964 [sh## Cover letter

Problem

I got in raft_availability_test.RaftAvailabilityTest.test_leader_transfers_recovery.acks=-1 error:

Max retries exceeded with url: /v1/partitions/kafka/topic-xvvoavvzqf/0/transfer_leadership?target=1 (Caused by ResponseError('too many 503 error responses'))

@jcsp investigated to this error and found:

the new leader is node 1, the intended destination. So somehow, the leadership transfer is being done, but the client is receiving the wrong status code.

That old leader is stepping down while in the middle of doing the leadership transfer
INFO 2022-01-10 11:47:00,964 [shard 1] raft - [group_id:1, {kafka/topic-xvvoavvzqf/0}] consensus.cc:134 - Stepping down as leader in term 15, dirty offset 91754

Then subsequently, the client retries, and is getting redirected to the new leader (good) but then the new leader is also responding 503, because of this in consensus.cc

    if (*target == _self.id()) {
        vlog(_ctxlog.warn, "Cannot transfer leadership to self");
        return seastar::make_ready_future<std::error_code>(
          make_error_code(errc::not_leader));
    }

The admin API interprets not_leader as a 503. In this case, not_leader isn't really the right error, we should be returning some kind of "no op" error internally and a 200 to the client (if they ask to transfer leadership to the current leader, that should just be a success)

Solution:

Return 200 in situation when user try to transfer leadership to current

mmaslankaprv

lgtm

dotnwat · 2022-01-11T19:04:31Z

What's the rationale for returning any error when transferring to self? if a natural leadership movement race results in the target becoming the leader then it's hard to imagine any error handling being anything other than ignoring the error.

jcsp · 2022-01-11T19:33:44Z

What's the rationale for returning any error when transferring to self?

I think this was my suggestion when chatting with Vadim about this earlier, it's totally nonessential though, could go either way. I think I was thinking that internally our calls aren't generally idempotent, but externally they are.

dotnwat · 2022-01-11T22:57:50Z

@VadimPlh I think you can merge this now as-is, or change it to return success. Erroring on the side of more information is usually a good tie breakers I suppose. This is an internal interface, so it isn't necessarily a binding decision.

Backport of #3182 #3446 #3834

VadimPlh added 3 commits January 11, 2022 16:29

ducktape: add test for transfer leadership to self

4b8b022

raft: add new error code for self transfer leadership

e098ff5

admin: use new error code in transfer leadership handler

7221604

VadimPlh requested review from dotnwat, jcsp and mmaslankaprv January 11, 2022 14:14

VadimPlh requested review from ivotron, NyaliaLui, rystsov and ztlpn as code owners January 11, 2022 14:14

github-actions bot added the area/redpanda label Jan 11, 2022

VadimPlh removed request for ivotron, rystsov, ztlpn and NyaliaLui January 11, 2022 14:36

jcsp approved these changes Jan 11, 2022

View reviewed changes

mmaslankaprv approved these changes Jan 11, 2022

View reviewed changes

dotnwat approved these changes Jan 11, 2022

View reviewed changes

VadimPlh merged commit b0a24b4 into redpanda-data:dev Jan 12, 2022

mmaslankaprv mentioned this pull request Feb 18, 2022

Backport of #3182 #3446 #3834 #3844

Merged

mmaslankaprv added a commit that referenced this pull request Feb 18, 2022

Merge pull request #3844 from mmaslankaprv/v21.11.x

73d4648

Backport of #3182 #3446 #3834

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[admin] Fix self leadership transfer #3446

[admin] Fix self leadership transfer #3446

VadimPlh commented Jan 11, 2022 •

edited

Loading

mmaslankaprv left a comment

dotnwat commented Jan 11, 2022

jcsp commented Jan 11, 2022

dotnwat commented Jan 11, 2022

[admin] Fix self leadership transfer #3446

[admin] Fix self leadership transfer #3446

Conversation

VadimPlh commented Jan 11, 2022 • edited Loading

Cover letter

Problem

Problem

Solution:

mmaslankaprv left a comment

Choose a reason for hiding this comment

dotnwat commented Jan 11, 2022

jcsp commented Jan 11, 2022

dotnwat commented Jan 11, 2022

VadimPlh commented Jan 11, 2022 •

edited

Loading