Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Improve handling of topology orphaned objects #10277

Conversation

fabriziopandini
Copy link
Member

@fabriziopandini fabriziopandini commented Mar 18, 2024

What this PR does / why we need it:
This PR fixes #10275 and improves how the topology controller handles referenced objects in case of errors, and more specifically

  • if InfrastructureCluster is created, but ControlPlane creation fails, InfrastructureCluster is tracked (the issue)
  • if infrastructureMachineTemplate is created, but ControlPlane creation fails, infrastructureMachineTemplate is cleaned up
  • if infrastructureMachineTemplate is created, but an error happens before MD is created, infrastructureMachineTemplate is cleaned up
  • if bootstrapTemplate is created, but an error happens before MD is created, bootstrapTemplate is cleaned up
  • if infrastructureMP is created, but an error happens before MP is created, infrastructureMP is cleaned up
  • if bootstrapConfig is created, but an error happens before MP is created, bootstrapConfig is cleaned up

I will keep the PR in WIP while I run some additional test

Which issue(s) this PR fixes:
Fixes #10275

/area clusterclass

/cc @sbueringer @chrischdi

@k8s-ci-robot k8s-ci-robot added area/clusterclass Issues or PRs related to clusterclass do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 18, 2024
@fabriziopandini fabriziopandini force-pushed the improve-handling-of-reconcileCP-errors branch from a446f38 to 42679bf Compare March 18, 2024 20:49
@fabriziopandini
Copy link
Member Author

/test pull-cluster-api-e2e-main

@mnaser
Copy link

mnaser commented Mar 18, 2024

@fabriziopandini : would it be handy to perhaps use controllerutil.OperationResult instead of bools (which can maybe be helpful for other things down the line?)

@fabriziopandini
Copy link
Member Author

would it be handy to perhaps use controllerutil.OperationResult instead of bools (which can maybe be helpful for other things down the line?)

controllerutil.OperationResult is not an exact match for reconcileReferencedTemplate, because one possible outcome is a template rotation.

also, this is an internal API, we can eventually refactor it again if we need more things down the line.

@fabriziopandini fabriziopandini changed the title [WIP] 🐛 Improve handling of topology orphaned objects 🐛 Improve handling of topology orphaned objects Mar 19, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 19, 2024
@fabriziopandini
Copy link
Member Author

Tested reproducing the error with CAPD, no duplicated InfrastructureClusters are created with the fix.
Also tested that after the cluster class patch is fixed, cluster provisioning restarts as expected

Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 26, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 72bfbb3466262c0662d681ee36619380c004577d

@fabriziopandini fabriziopandini force-pushed the improve-handling-of-reconcileCP-errors branch from a30e0a1 to 5f141e8 Compare March 26, 2024 19:52
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 26, 2024
@sbueringer
Copy link
Member

Thx!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 27, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: ecf3b2db4dda4af199608c9a2863001aab09baea

@sbueringer
Copy link
Member

Let's cherry-pick (at least into 1.6)

/cherry-pick release-1.6

@k8s-infra-cherrypick-robot

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.6 in a new PR and assign it to you.

In response to this:

Let's cherry-pick (at least into 1.6)

/cherry-pick release-1.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member

/cherry-pick release-1.5

@k8s-infra-cherrypick-robot

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.5 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 27, 2024
@k8s-ci-robot k8s-ci-robot merged commit 3a16912 into kubernetes-sigs:main Mar 27, 2024
20 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.7 milestone Mar 27, 2024
@k8s-infra-cherrypick-robot

@sbueringer: #10277 failed to apply on top of branch "release-1.6":

Applying: Avoid leaving orphaned InfrastructureCluster when create control plane fails
Applying: Best effort cleanup of referenced templates/objects
Using index info to reconstruct a base tree...
M	internal/controllers/topology/cluster/desired_state.go
M	internal/controllers/topology/cluster/desired_state_test.go
M	internal/controllers/topology/cluster/reconcile_state.go
M	internal/controllers/topology/cluster/reconcile_state_test.go
Falling back to patching base and 3-way merge...
Auto-merging internal/controllers/topology/cluster/reconcile_state_test.go
Auto-merging internal/controllers/topology/cluster/reconcile_state.go
CONFLICT (content): Merge conflict in internal/controllers/topology/cluster/reconcile_state.go
Auto-merging internal/controllers/topology/cluster/desired_state_test.go
Auto-merging internal/controllers/topology/cluster/desired_state.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 Best effort cleanup of referenced templates/objects
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

Let's cherry-pick (at least into 1.6)

/cherry-pick release-1.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-infra-cherrypick-robot

@sbueringer: #10277 failed to apply on top of branch "release-1.5":

Applying: Avoid leaving orphaned InfrastructureCluster when create control plane fails
Using index info to reconstruct a base tree...
M	internal/controllers/topology/cluster/reconcile_state.go
M	internal/controllers/topology/cluster/reconcile_state_test.go
Falling back to patching base and 3-way merge...
Auto-merging internal/controllers/topology/cluster/reconcile_state_test.go
Auto-merging internal/controllers/topology/cluster/reconcile_state.go
CONFLICT (content): Merge conflict in internal/controllers/topology/cluster/reconcile_state.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Avoid leaving orphaned InfrastructureCluster when create control plane fails
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member

@fabriziopandini We probably should cherry-pick manually

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/clusterclass Issues or PRs related to clusterclass cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unbound resource creation within managed topologies
6 participants