Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto update ClusterSet and MemberClusterAnnounce in leader cluster #3956

Merged
merged 1 commit into from
Jul 28, 2022

Conversation

hjiajing
Copy link
Contributor

Auto update ClusetSet and MemberClusterAnnounce in the leader cluster.

  • When a new member cluster is added to the ClusterSet. The ClusterSet in the leader cluster will update the new member cluster in the spec.members.
  • When a member cluster departs the ClusterSet. The MemberClusterAnnounce in the leader cluster will be deleted. The member cluster will be removed from the spec.members.

@hjiajing
Copy link
Contributor Author

@luolanzone Could you please take a quick review for this WIP PR. Thanks.
I added an annotation "antrea.io/is-member-deleted" in MemberClusterAnnounce because the leader cluster controller need to get ClusterSetID. So the MemberClusterAnnounce will not be deleted by a member cluster.
Once a MemberClusterAnnounce is annotated "deleted", the leader cluster will remove the member in the member list and delete the MemberClusterAnnounce.
In addition, I removed the member check in the webhook. As well as the SA check. I'm concerned about this. If did this, the member will not need SA any more.

@codecov-commenter
Copy link

codecov-commenter commented Jun 30, 2022

Codecov Report

Merging #3956 (e9b2670) into main (13f470e) will decrease coverage by 0.04%.
The diff coverage is 54.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3956      +/-   ##
==========================================
- Coverage   64.50%   64.46%   -0.05%     
==========================================
  Files         294      294              
  Lines       43726    43827     +101     
==========================================
+ Hits        28206    28253      +47     
- Misses      13236    13270      +34     
- Partials     2284     2304      +20     
Flag Coverage Δ
kind-e2e-tests 50.88% <ø> (-0.32%) ⬇️
unit-tests 44.23% <54.61%> (+0.01%) ⬆️
Impacted Files Coverage Δ
multicluster/cmd/multicluster-controller/leader.go 0.00% <0.00%> (ø)
...llers/multicluster/member_clusterset_controller.go 16.30% <0.00%> (-0.15%) ⬇️
...luster-controller/memberclusterannounce_webhook.go 55.00% <56.09%> (-0.56%) ⬇️
...s/multicluster/memberclusterannounce_controller.go 64.15% <58.02%> (-7.64%) ⬇️
...lers/multicluster/commonarea/remote_common_area.go 27.75% <100.00%> (+0.29%) ⬆️
...agent/flowexporter/connections/deny_connections.go 65.59% <0.00%> (-19.36%) ⬇️
pkg/agent/flowexporter/connections/connections.go 60.60% <0.00%> (-15.16%) ⬇️
pkg/agent/flowexporter/exporter/exporter.go 67.82% <0.00%> (-10.90%) ⬇️
pkg/agent/controller/trafficcontrol/controller.go 81.08% <0.00%> (-2.00%) ⬇️
pkg/controller/grouping/controller.go 65.13% <0.00%> (-1.98%) ⬇️
... and 20 more

@luolanzone
Copy link
Contributor

@hjiajing Could you please fix the unit test failure? https://github.com/antrea-io/antrea/runs/7126092905?check_suite_focus=true

@hjiajing
Copy link
Contributor Author

@luolanzone Sure. I will fix it now

@hjiajing hjiajing force-pushed the auto-update branch 4 times, most recently from e55252a to 0ea8255 Compare July 4, 2022 02:54
@luolanzone
Copy link
Contributor

@hjiajing Could you resolve the conflicts? thanks.

@hjiajing hjiajing force-pushed the auto-update branch 2 times, most recently from 1b35a35 to 26b7eae Compare July 4, 2022 04:04
@hjiajing hjiajing changed the title [WIP] Auto update ClusterSet and MemberClusterAnnounce in leader cluster Auto update ClusterSet and MemberClusterAnnounce in leader cluster Jul 4, 2022
@luolanzone luolanzone added the area/multi-cluster Issues or PRs related to multi cluster. label Jul 4, 2022
}
}
}
if len(clusterSet.Spec.Leaders) != 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can remove this check and add schema validation like this: https://github.com/antrea-io/antrea/pull/3964/files#diff-d91c75a71bf454b846eb3998698d615fadf6108177e9624428572d6440887962R42-R43,
You can refer to the change to add this refinement to this PR, so it won't have any merge dependency between two commits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one?

}
// If err != nil, probably ClusterClaims were deleted during the processing of MemberClusterAnnounce.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we move the error out of the "if" block?

err here should be returned by r.List() at line 131? If so, I do not feel it is relevant to the comment. @luolanzone : could you check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns yes, you are right. I feel original comment and logic doesn't handle the case correctly. If all ClusterClaims are deleted, it should be empty list instead of an error. I think it should retry if there is an error. @hjiajing Could you double check and refine this part? we may leave ClusterClaim webhook part in a new PR.

}
// If err != nil, probably ClusterClaims were deleted during the processing of MemberClusterAnnounce.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle MemberClusterAnnounce deletion (comments in line 109 are no longer valid?).

@hjiajing hjiajing force-pushed the auto-update branch 3 times, most recently from a0c5bec to e49b844 Compare July 14, 2022 08:22
@hjiajing hjiajing force-pushed the auto-update branch 3 times, most recently from bbaa605 to bb7b755 Compare July 22, 2022 06:29
@@ -108,6 +111,8 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr
r.mapLock.Lock()
defer r.mapLock.Unlock()

// If !ok, it means member not found. If this happens, the MemberClusterAnnounce should soon be deleted.
// Nothing to do here.
if data, ok := r.timerData[common.ClusterID(memberAnnounce.ClusterID)]; ok {
Copy link
Contributor

@jianjuns jianjuns Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luolanzone : I feel we should revisit all these ClusterSet / cluster status code. I have the following question:

  1. Do we really need leader cluster ID in the MemberAnnounce?
  2. Do we still need all status conditions after we remove election?
  3. Add/RemoveMember() need not to be called by ClusterSet changes, after we change to auto-update ClusterSet, but we can start to maintain the timerdata once MemberClusterAnnounce is created?

Could you please look into the code to understand the whole status update logic, and clean up the code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can first summarize what conditions we have, and they are set for what cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luolanzone : a reminder for you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will check this part, thanks for reminding.

@hjiajing hjiajing force-pushed the auto-update branch 4 times, most recently from 309af10 to 8fcbbb6 Compare July 24, 2022 11:57
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have further comments for this PR, except for a reminder for adding code comments.

@luolanzone : would you take another look before we can merge?

@@ -108,6 +111,8 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr
r.mapLock.Lock()
defer r.mapLock.Unlock()

// If !ok, it means member not found. If this happens, the MemberClusterAnnounce should soon be deleted.
// Nothing to do here.
if data, ok := r.timerData[common.ClusterID(memberAnnounce.ClusterID)]; ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luolanzone : a reminder for you.

Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hjiajing Could you rebase the PR? A few questions in the comments

@@ -63,10 +64,15 @@ func runLeader(o *Options) error {
if err = memberClusterStatusManager.SetupWithManager(mgr); err != nil {
return fmt.Errorf("error creating MemberClusterAnnounce controller: %v", err)
}

noCachedClient, err := client.New(mgr.GetConfig(), client.Options{Scheme: mgr.GetScheme(), Mapper: mgr.GetRESTMapper()})
Copy link
Contributor

@luolanzone luolanzone Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leader is running in a Namespace scope, it's supposed to watch those resources in the same Namespace where the leader controller is running. I didn't see you set the Namespace here.
Do we really need to use no cached client in order to remove the list in the RBAC? I feel it's better to keep the same as other controllers. @jianjuns any comment for this?

Copy link
Contributor

@jianjuns jianjuns Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know all the implication here, but I would watch only in the same Namespace.

Another question - in RBAC configuration, we do not create a "Role" to restrict the permissions to only the “multcluster" Namespace? @luolanzone

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns we defined a 'Role' antrea-mc-controller-role in antrea-multicluster-leader-namespaced.yml, Antrea mc-controller will only be able to watch resources under “antrea-multicluster" Namespace.

@hjiajing hjiajing force-pushed the auto-update branch 3 times, most recently from 2962ce8 to 1e02aa1 Compare July 26, 2022 03:31
@@ -63,10 +64,15 @@ func runLeader(o *Options) error {
if err = memberClusterStatusManager.SetupWithManager(mgr); err != nil {
return fmt.Errorf("error creating MemberClusterAnnounce controller: %v", err)
}

noCachedClient, err := client.New(mgr.GetConfig(), client.Options{Scheme: mgr.GetScheme(), Mapper: mgr.GetRESTMapper()})
Copy link
Contributor

@jianjuns jianjuns Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know all the implication here, but I would watch only in the same Namespace.

Another question - in RBAC configuration, we do not create a "Role" to restrict the permissions to only the “multcluster" Namespace? @luolanzone

Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, a few nits

@@ -63,10 +64,15 @@ func runLeader(o *Options) error {
if err = memberClusterStatusManager.SetupWithManager(mgr); err != nil {
return fmt.Errorf("error creating MemberClusterAnnounce controller: %v", err)
}

noCachedClient, err := client.New(mgr.GetConfig(), client.Options{Scheme: mgr.GetScheme(), Mapper: mgr.GetRESTMapper()})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns we defined a 'Role' antrea-mc-controller-role in antrea-multicluster-leader-namespaced.yml, Antrea mc-controller will only be able to watch resources under “antrea-multicluster" Namespace.

Signed-off-by: hujiajing <hjiajing@vmware.com>
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall @jianjuns Could you help to take a look if this is ready to merge? thanks.

@jianjuns
Copy link
Contributor

/test-multicluster-e2e
/skip-all

@jianjuns
Copy link
Contributor

LGTM overall @jianjuns Could you help to take a look if this is ready to merge? thanks.

No further comment from me.

@jianjuns jianjuns merged commit 2bf6a9a into antrea-io:main Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/multi-cluster Issues or PRs related to multi cluster.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants