Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txn: Error handling for pessimistic locks #332

Merged
merged 7 commits into from
Feb 22, 2022

Conversation

pingyu
Copy link
Collaborator

@pingyu pingyu commented Feb 5, 2022

Close: #313

What's changed

Automatic rollback successful locks when pessimistic_lock is failed.

How it is done

Add a new error PessimisticLockError holding successful locks return by TiKV server.
On failure, pessimistic_lock acquires the successful keys from PessimisticLockError and do the rollback.

Tests

  • Manual
  • Unit test
  • Integrated test

Signed-off-by: pingyu <yuping@pingcap.com>
Signed-off-by: pingyu <yuping@pingcap.com>
Signed-off-by: pingyu <yuping@pingcap.com>
Signed-off-by: pingyu <yuping@pingcap.com>
@pingyu
Copy link
Collaborator Author

pingyu commented Feb 5, 2022

/cc @ekexium

Copy link
Collaborator

@ekexium ekexium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! I just have a few suggestions.

And I noticed we didn't set the MULTI_REGION correctly in Github actions. Please do me a favor to fix it :). Simply change the last command in ci.yml to MULTI_REGION=1 make integration-test

src/transaction/requests.rs Outdated Show resolved Hide resolved
src/request/plan.rs Outdated Show resolved Hide resolved

pairs
/// Rollback pessimistic lock
async fn pessimistic_lock_rollback(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had better print some logs here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to appropriately change the buffer state. In the current PR it doesn't affect at all, but I'm afriad the function might be misused. Or we can add a comment to explicitly warn such behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had better print some logs here.

En... I print a debug msg "rollback pessimistic lock" at the beginning.
What more logs do we need ? (since there is also a simple debug log at the beginning of other methods)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekexium Do we allow the user to retry pessimistic lock? If we do, we should not eagerly rollback the successfully locked keys.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekexium Do we allow the user to retry pessimistic lock? If we do, we should not eagerly rollback the successfully locked keys.

Users can retry, but we cannot make assumptions on that. The lock request may already have retried, subject to the backoff option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had better print some logs here.

En... I print a debug msg "rollback pessimistic lock" at the beginning. What more logs do we need ? (since there is also a simple debug log at the beginning of other methods)

I didn't see it 🤦 . The logging is a mess for now. Let's just keep it as is in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can retry, but we cannot make assumptions on that. The lock request may already have retried, subject to the backoff option.

Let me explain my point again. If we allow the user to retry the pessimistic lock when the client returns errors, the client should remember which keys are locked, but not eagerly roll back them. Besides, the client should roll back the locked key when client.rollback() is called, or panic if get dropped without rolling back or committing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the major drawback of your idea is that if the user doesn't retry the same set of keys, then the locks will remain and may block other txns for some time.

If users want to retry the same set of keys, they can use a different backoff strategy (and we may consider enhance the retry of pessimistic locks to use a newer for_update_ts).

Reference: client-go immediately rollback the successful pessimistic locks, but I think part of the reason is to cooperate with TiDB, client-rust doesn't have to bahave in the same way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ekexium . I also think that it would be better to provide an "atomic" method to developers, which will do nothing when return error.

tests/integration_tests.rs Outdated Show resolved Hide resolved
@pingyu
Copy link
Collaborator Author

pingyu commented Feb 14, 2022

@ekexium
Comments addressed. I'm sorry for the late. PTAL, thanks~

Signed-off-by: pingyu <yuping@pingcap.com>
@pingyu
Copy link
Collaborator Author

pingyu commented Feb 18, 2022

The integration test case txn_optimistic_heartbeat is stably failed. Let me see see.

Signed-off-by: pingyu <yuping@pingcap.com>
Signed-off-by: pingyu <yuping@pingcap.com>
@pingyu
Copy link
Collaborator Author

pingyu commented Feb 19, 2022

The integration test case txn_optimistic_heartbeat is stably failed. Let me see see.

The error is raised from clear_tikv in first tests::common::init(), and should be caused by the 500 ms timeout of DEFAULT_REGION_BACKOFF is not enough for CI environment.

I fixed it by prolonging the max delay to 10 seconds. Now all checks are passed.

@andylokandy
Copy link
Collaborator

@ekexium PTAL

@andylokandy andylokandy reopened this Feb 21, 2022
@ekexium
Copy link
Collaborator

ekexium commented Feb 21, 2022

LGTM. We should consider support optional backoffs for all raw requests.

@andylokandy andylokandy merged commit 74db41c into tikv:master Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error handling for pessimistic locks
3 participants