Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 fix: s3: fix bucket object not found #4879

Merged
merged 3 commits into from
Mar 27, 2024

Conversation

r4f4
Copy link
Contributor

@r4f4 r4f4 commented Mar 18, 2024

What type of PR is this?
/kind bug

What this PR does / why we need it:
The s3.HeadObject API call can return "NotFound" when either the bucket or the object does not exist (as opposed to the more descriptive s3.ErrCodeNoSuchKey or s3.ErrCodeNoSuchBucket).

This would cause the machine controller to loop indefinitely trying to delete an already deleted object but failing:

    E0316 16:37:08.973942     366 awsmachine_controller.go:307] "unable to delete machine" err=<
            deleting bootstrap data object: deleting S3 object: NotFound: Not Found
                    status code: 404, request id: 5Z101DW1KN380WTY, host id: tYlSi9K38lBkIsr2DNf/xFfgDuFaVfeUmpscXdljiMZC5iRxPIDuXSLwHJwdFnosYCfi7Bih25GaDpVAbSq4ZA==
     >

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #4859

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emojis
  • adds unit tests
  • adds or updates e2e tests

Release note:

Fix a bug where the machine controller will keep trying to delete an already deleted s3 object.

If any error of non awserr.Error type happens when trying to list a
bootstrap data object, it would be silently ignored.
The `s3.HeadObject` API call can return "NotFound" when either the
bucket or the object does not exist (as opposed to the more descriptive
`s3.ErrCodeNoSuchKey` or `s3.ErrCodeNoSuchBucket`).

This would cause the machine controller to loop indefinitely trying to
delete an already deleted object but failing:

```
E0316 16:37:08.973942     366 awsmachine_controller.go:307] "unable to delete machine" err=<
	deleting bootstrap data object: deleting S3 object: NotFound: Not Found
		status code: 404, request id: 5Z101DW1KN380WTY, host id: tYlSi9K38lBkIsr2DNf/xFfgDuFaVfeUmpscXdljiMZC5iRxPIDuXSLwHJwdFnosYCfi7Bih25GaDpVAbSq4ZA==
 >
```
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 18, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @r4f4. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 18, 2024
@nrb
Copy link
Contributor

nrb commented Mar 19, 2024

/ok-to-test

/assign @damdo

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 19, 2024
@nrb
Copy link
Contributor

nrb commented Mar 21, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 21, 2024
@Ankitasw Ankitasw added tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 27, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 8e04d87 into kubernetes-sigs:main Mar 27, 2024
20 checks passed
@AndiDog
Copy link
Contributor

AndiDog commented Mar 27, 2024

Does this require a backport to v2.3.x and others?

@damdo
Copy link
Member

damdo commented Apr 3, 2024

@AndiDog it depends on when the issue was introduced, would you be able to check?

@richardcase
Copy link
Member

/cherry-pick release-2.4

@k8s-infra-cherrypick-robot

@richardcase: new pull request created: #4907

In response to this:

/cherry-pick release-2.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@AndiDog
Copy link
Contributor

AndiDog commented Jun 5, 2024

No backport required for v2.3.x since the HeadObject change isn't in release-2.3 branch

fad3t pushed a commit to fad3t/cluster-api-provider-aws that referenced this pull request Jul 25, 2024
* 🐛 fix: s3: do not ignore non-aws errors when deleting object

If any error of non awserr.Error type happens when trying to list a
bootstrap data object, it would be silently ignored.

* 🐛fix: s3: ignore "NotFound" errors

The `s3.HeadObject` API call can return "NotFound" when either the
bucket or the object does not exist (as opposed to the more descriptive
`s3.ErrCodeNoSuchKey` or `s3.ErrCodeNoSuchBucket`).

This would cause the machine controller to loop indefinitely trying to
delete an already deleted object but failing:

```
E0316 16:37:08.973942     366 awsmachine_controller.go:307] "unable to delete machine" err=<
	deleting bootstrap data object: deleting S3 object: NotFound: Not Found
		status code: 404, request id: 5Z101DW1KN380WTY, host id: tYlSi9K38lBkIsr2DNf/xFfgDuFaVfeUmpscXdljiMZC5iRxPIDuXSLwHJwdFnosYCfi7Bih25GaDpVAbSq4ZA==
 >
```

* 🌱s3: add unit test for already deleted s3 object.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deleting s3 object NotFound errors
8 participants