Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ClusterClaim deletion validator #4062

Merged
merged 1 commit into from
Aug 11, 2022

Conversation

luolanzone
Copy link
Contributor

@luolanzone luolanzone commented Jul 29, 2022

  1. Refine ClusterClaim webhook and add deletion validator to deny deletion
    if there is any ClusterSet referring to the ClusterClaim.
  2. Move status data initialization from leader ClusterSet controller to
    MemberClusterAnnounce controller and simplify the status:
  • When the MemberClusterAnnounce is created, the Ready status will be
    True and Reason will be Connected.
  • When the MemberClusterAnnounce is not updated in the agreed period,
    the Ready status will be false and Reason will be Disconnected.
  • When the MemberClusterAnnounce is updated after Disconnected, the Ready
    status will be reset to true and Reason will be Connected, and
    lastUpdateTime will also be reset.
  • When the MemberCLusterAnnounce is deleted, the corresponding status
    will be removed.

Signed-off-by: Lan Luo luola@vmware.com

@luolanzone luolanzone added the area/multi-cluster Issues or PRs related to multi cluster. label Jul 29, 2022
@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

@codecov
Copy link

codecov bot commented Jul 29, 2022

Codecov Report

Merging #4062 (25c10a9) into main (024254d) will increase coverage by 0.49%.
The diff coverage is 84.39%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4062      +/-   ##
==========================================
+ Coverage   67.51%   68.00%   +0.49%     
==========================================
  Files         297      310      +13     
  Lines       44988    45477     +489     
==========================================
+ Hits        30373    30928     +555     
+ Misses      12238    12125     -113     
- Partials     2377     2424      +47     
Flag Coverage Δ
e2e-tests 58.46% <59.57%> (?)
integration-tests 35.47% <23.25%> (+0.10%) ⬆️
kind-e2e-tests 50.03% <ø> (-0.74%) ⬇️
unit-tests 44.28% <71.63%> (-0.03%) ⬇️
Impacted Files Coverage Δ
...luster-controller/memberclusterannounce_webhook.go 63.33% <ø> (+8.33%) ⬆️
...llers/multicluster/member_clusterset_controller.go 75.87% <ø> (+2.07%) ⬆️
...md/multicluster-controller/clusterclaim_webhook.go 74.50% <74.50%> (ø)
...s/multicluster/memberclusterannounce_controller.go 74.73% <86.53%> (+10.57%) ⬆️
...llers/multicluster/leader_clusterset_controller.go 77.58% <87.50%> (+12.14%) ⬆️
...lers/multicluster/commonarea/remote_common_area.go 76.01% <96.15%> (+6.22%) ⬆️
...icluster/cmd/multicluster-controller/controller.go 66.66% <100.00%> (+58.80%) ⬆️
.../flowexporter/connections/conntrack_connections.go 66.66% <0.00%> (-12.86%) ⬇️
...kg/controller/networkpolicy/antreanetworkpolicy.go 74.75% <0.00%> (-12.43%) ⬇️
pkg/agent/interfacestore/types.go 61.11% <0.00%> (-8.34%) ⬇️
... and 62 more

@luolanzone luolanzone force-pushed the deletion-webhook-and-cleanup branch from 1584db0 to a3a9658 Compare July 29, 2022 02:35
defer r.mutex.Unlock()
r.mutex.Lock()

if leaderID == common.InvalidClusterID {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InvalidClusterID will be set only when the ClusterSet is not created and common area is nil or the ClusterSet is deleted and common area is reset to nil. So there is no case to call updateLeaderStatus when the leader ID is InvalidClusterID.

@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

data.leaderStatus.reason = ReasonConnectedLeader
}
if !apierrors.IsNotFound(err) {
return ctrl.Result{}, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to unlock. I suggest to set err to nil if apierrors.IsNotFound(err), and break and check err and return after line 135.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, defined a variable err.

// TODO: Add ClusterClaim webhook to make sure it cannot be deleted while ClusterSet is present.
// If it's not found error, ClusterClaims were probably deleted during the processing of MemberClusterAnnounce.
// Nothing to handle in this case since it means no ClusterSet is referring to it or the ClusterSet is deleted
// as well, no status need to be updated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and no need to update cluster status"

I actually suggest to remove this sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the whole sentence.

// Check whether this local cluster is the leader for this member.
clusterClaim := &multiclusterv1alpha2.ClusterClaim{}
if err := r.Get(context.TODO(), types.NamespacedName{Name: multiclusterv1alpha2.WellKnownClusterClaimID, Namespace: req.Namespace}, clusterClaim); err == nil {
if clusterClaim.Value == memberAnnounce.LeaderClusterID {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When possible the leader clusterID does not match? Again, I suggest to remove laderClusterID from memberAnnounce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that user creates a member ClusterSet with right access but incorrect leader ClusterID, if we didn't validate the leader ID and allow it to create the member cluster announce, then member cluster will show a status with wrong cluster ID but connected status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually feel leader cluster ID in ClusterSet is not very useful. It is not used anywhere, except these validations. But anyway, we can keep the current way, if you prefer.

// If err != nil, probably ClusterClaims were deleted during the processing of MemberClusterAnnounce.
// Nothing to handle in this case and MemberClusterAnnounce will also be deleted soon.
// TODO: Add ClusterClaim webhook to make sure it cannot be deleted while ClusterSet is present.
// If it's not found error, ClusterClaims were probably deleted during the processing of MemberClusterAnnounce.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should revisit the design of Add/RemoveMember() after ClusterSet changes. Why that indirect, but not start/stop memberAnnounce periodic check right after the memberAnnounce is created/deleted in MemberClusterAnnounceReconciler?

Pasted below my earlier comments too.

  1. Do we really need leader cluster ID in the MemberAnnounce?
  2. Do we still need all status conditions after we remove election?
  3. Add/RemoveMember() need not to be called by ClusterSet changes, after we change to auto-update ClusterSet, but we can start to maintain the timerdata once MemberClusterAnnounce is created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, from member cluster's perspective, I feel we don't have to provide leader's ClusterID as long as the access token is correct. We probably need to refactor the whole ClusterSet part including the ClusterSet spec vs ClusterSet status for joined member.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can decide leader ClusterID later. But please consider 3). The current code is unnecessarily complex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will double check this part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jianjuns I have refined this part, the MemberClusterAnnounce is responsible to initialize the data now. Could you help to take another look? thanks.

@luolanzone luolanzone force-pushed the deletion-webhook-and-cleanup branch 4 times, most recently from c0241f1 to 5729c74 Compare August 5, 2022 02:55
@@ -114,38 +108,34 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr
klog.V(2).InfoS("Reset lastUpdateTime", "cluster", memberAnnounce.ClusterID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove or rephrase the comment: "// If the member is not found in timerData, it means the member is not added to the ClusterSet or was deleted."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, changed it to If the member is not found in timerData, it means MemberClusterAnnounce is not created or it was deleted.

@@ -114,38 +108,34 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr
klog.V(2).InfoS("Reset lastUpdateTime", "cluster", memberAnnounce.ClusterID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the code more, I wonder should we do the following two:

  1. Merge timerData and memberStatus to one map (and call it memberStatusMap)
  2. Move this block to addMember() (and rename the func to sth like addUpdateMemberStatus())

What you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am thinking it should work, but If we'd like to include this PR to 1.8, I feel it's better to has a follow up PR to rethink about the whole status part, what's your suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine to do another PR, but I do feel we should clean up the code. The current implementation is unnecessarily complex, hard to follow and may include bugs due to that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I feel we need rethink the design including ClusterClaim/ClusterSet and status.

}
// If MemberClusterAnnounce is deleted, no need to process because RemoveMember must already
// be called.
// If MemberClusterAnnounce is deleted, no need to process because the Finalizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If MemberClusterAnnounce is deleted, no further processing is needed, as cleanup must have been done when the Finalizer was removed.

@@ -114,38 +108,34 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr
klog.V(2).InfoS("Reset lastUpdateTime", "cluster", memberAnnounce.ClusterID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine to do another PR, but I do feel we should clean up the code. The current implementation is unnecessarily complex, hard to follow and may include bugs due to that.

jianjuns
jianjuns previously approved these changes Aug 8, 2022
return ctrl.Result{}, client.IgnoreNotFound(err)
}

r.mapLock.Lock()
// If the member is not found in timerData, it means the member is not added to the ClusterSet or was deleted.
if data, ok := r.timerData[common.ClusterID(memberAnnounce.ClusterID)]; ok {
// If the member is not found in memberStatus, it means MemberClusterAnnounce is not created
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think this block (from line 94 to 119) should be moved to addMemberStatus (probably rename it to addOrUpdateMemberStatus). Could you check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -163,6 +142,7 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr
klog.ErrorS(err, "Failed to add member cluster to ClusterSet", "cluster", memberAnnounce.ClusterID)
return ctrl.Result{}, err
}
r.addMemberStatus(memberID, v1.ConditionTrue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move it to the end of the func after adding the Finalizer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to place it before adding Finalizer since member is actually connected and able to do export/import etc.

@@ -163,6 +142,7 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr
klog.ErrorS(err, "Failed to add member cluster to ClusterSet", "cluster", memberAnnounce.ClusterID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ignore "already exists" error in addMemberToClusterSet (in my mind better to add comments on addMemberToClusterSet to explain the behavior)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, return nil for already exists error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I did not see that. Then we should move r.addOrUpdateMemberStatus() to line 113 before "if common.StringExistsInSlice(memberAnnounce.Finalizers, finalizer)"?

I assume we can call the func only once, and remove the call at line 95.

@@ -163,6 +142,7 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr
klog.ErrorS(err, "Failed to add member cluster to ClusterSet", "cluster", memberAnnounce.ClusterID)
return ctrl.Result{}, err
}
r.addMemberStatus(memberID, v1.ConditionTrue)
klog.InfoS("Adding finalizer to MemberClusterAnnounce", "MemberClusterAnnounce", klog.KObj(memberAnnounce))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will repeatedly log this message for every memberAnnounce update. Should should check if Finalizer is already added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the finalizer exists, it will return, code won't reach here.

	if common.StringExistsInSlice(memberAnnounce.Finalizers, finalizer) {
		return ctrl.Result{}, nil
	}

@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

@@ -163,6 +142,7 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr
klog.ErrorS(err, "Failed to add member cluster to ClusterSet", "cluster", memberAnnounce.ClusterID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I did not see that. Then we should move r.addOrUpdateMemberStatus() to line 113 before "if common.StringExistsInSlice(memberAnnounce.Finalizers, finalizer)"?

I assume we can call the func only once, and remove the call at line 95.

// Initialize the member status with unknown status when controller restarts.
// Let's background process to check and update the connection status.
// v1.ConditionUnknown will be ignored if status already exists in the map.
r.addOrUpdateMemberStatus(memberID, v1.ConditionUnknown)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this call, and keep only a single call of addOrUpdateMemberStatus(), e.g. at line 112?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions and a few nits.

}
}

if apierrors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just do:

localClusterMemberAnnounceExists := err == nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return err
}
var localClusterMemberAnnounceExist bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

localClusterMemberAnnounceExist -> localClusterMemberAnnounceExists

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

LastTransitionTime: metav1.Now(),
Message: "Member created",
Reason: ReasonNeverConnected,
Message: "Member Created",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - what is difference between "Member Created" and "Member Connected"? Should we just set the condition to "ConditionTrue + Member Connected"? Or there is a protocol that member should notify leader after it is ready by updating the Announce?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Member Created" will be used only when controller starts. It will become Member Connected" when member cluster send a MemberClusterAnnounce update. The purpose is handle the case MemberClusterAnnounce aleady exists on leader cluster, but it might be stale and will be cleaned up soon.

Copy link
Contributor

@jianjuns jianjuns Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I meant what is the problem if we set the condition to "ConditionTrue + Member Connected" directly here? If the member cluster does not update the Announce, it will be changed to "Disconnected" anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will be changed to "Disconnected", I feel both way works. Do you recommend to use Member Connected only?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does member update the Announce right away after the creation? If yes, I am fine with the current way too; if not, I feel "ConditionTrue + Member Connected" works better, while "ConditionUnknown" is misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be updated every 5 seconds if member cluster works well. But you are right, I feel stale MemberClusterAnnounce case will be rare, let me change it to "ConditionTrue + Member Connected".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do you think 5 seconds may be too frequent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I feel it can be longer, how about 15s?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking about 10 seconds. 15 seconds sound good to me too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I updated it to 10s in this PR, it can be quicker to detect failure.

@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

jianjuns
jianjuns previously approved these changes Aug 10, 2022

localClusterMemberAnnounceExists := !apierrors.IsNotFound(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to pass nil to apierrors.IsNotFound()? I would suggest:

localClusterMemberAnnounceExists := err == nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

1. Refine ClusterClaim webhook, add deletion validator to deny deletion
   if there is any ClusterSet referring to it.
2. Move status data initialization from leader ClusterSet controller to
MemberClusterAnnounce controller and simplify the status:
* When the MemberClusterAnnounce is created, the `Ready` status will be
  `True` and `Reason` will be `Connected`.
* When the MemberClusterAnnounce is not updated in the agreed period,
  the `Ready` status will be `false` and `Reason` will be `Disconnected`.
* When the MemberClusterAnnounce is updated after Disconnected, the `Ready`
  status will be reset to `true` and `Reason` will be `Connected`,
  lastUpdateTime will also be reset.
* When the MemberCLusterAnnounce is deleted, the corresponding status
  will be removed.

Signed-off-by: Lan Luo <luola@vmware.com>
@jianjuns
Copy link
Contributor

/test-multicluster-e2e
/test-integration

@jianjuns
Copy link
Contributor

/skip-all

@jianjuns jianjuns merged commit cb70ad1 into antrea-io:main Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/multi-cluster Issues or PRs related to multi cluster.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants