Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup ResourceExport if exported Service has no available Endpoints #4056

Merged

Conversation

luolanzone
Copy link
Contributor

@luolanzone luolanzone commented Jul 26, 2022

Refine ServiceExport controller to watch Endpoints events to
ensure that Service kind of ResourceExport can be removed
when the exported Service no longer has available Endpoints,
or skip writing ResourceExport if there is no Endpoints at the beginning.

Fixes #4055
Signed-off-by: Lan Luo luola@vmware.com

@luolanzone luolanzone changed the title Cleanup ResourceExport if Endpoints list empty Cleanup ResourceExport if exported Service has no available Endpoints Jul 26, 2022
@luolanzone luolanzone added the area/multi-cluster Issues or PRs related to multi cluster. label Jul 26, 2022
@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

@codecov
Copy link

codecov bot commented Jul 26, 2022

Codecov Report

Merging #4056 (fbc89cb) into main (d9c4629) will increase coverage by 4.70%.
The diff coverage is 64.70%.

❗ Current head fbc89cb differs from pull request most recent head 2fd6842. Consider uploading reports for the commit 2fd6842 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4056      +/-   ##
==========================================
+ Coverage   61.26%   65.97%   +4.70%     
==========================================
  Files         293      307      +14     
  Lines       43686    43961     +275     
==========================================
+ Hits        26765    29003    +2238     
+ Misses      14693    12630    -2063     
- Partials     2228     2328     +100     
Flag Coverage Δ *Carryforward flag
e2e-tests 60.71% <39.09%> (?)
kind-e2e-tests 50.64% <ø> (+6.40%) ⬆️ Carriedforward from 684dca3
unit-tests 44.20% <52.08%> (-0.02%) ⬇️ Carriedforward from 684dca3

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files Coverage Δ
...apis/multicluster/v1alpha2/clusterclaim_webhook.go 24.13% <11.11%> (ø)
.../cmd/multicluster-controller/clusterset_webhook.go 59.37% <59.37%> (ø)
...ntrollers/multicluster/serviceexport_controller.go 78.75% <70.42%> (+11.98%) ⬆️
...icluster/cmd/multicluster-controller/controller.go 62.92% <100.00%> (+54.27%) ⬆️
multicluster/cmd/multicluster-controller/main.go 66.66% <100.00%> (+66.66%) ⬆️
pkg/controller/networkpolicy/tier.go 50.00% <0.00%> (-5.00%) ⬇️
pkg/controller/ipam/antrea_ipam_controller.go 76.41% <0.00%> (-2.63%) ⬇️
pkg/agent/flowexporter/utils.go 74.46% <0.00%> (-2.13%) ⬇️
...lowaggregator/clickhouseclient/clickhouseclient.go 81.41% <0.00%> (-1.58%) ⬇️
pkg/agent/util/iptables/iptables.go 43.52% <0.00%> (-1.56%) ⬇️
... and 87 more

@luolanzone luolanzone force-pushed the handle-service-with-empty-endpoints branch from 35beb4a to bafbd66 Compare July 26, 2022 08:36
@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

}
return ctrl.Result{}, nil
}

// if corresponding Service doesn't exist, update ServiceExport's status reason to not_found_service,
// If corresponding Service doesn't exist, update ServiceExport's status reason to not_found_service,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not_found_service -> service_not_found?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the corresponding Service

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -313,15 +348,18 @@ func (r *ServiceExportReconciler) updateSvcExportStatus(ctx context.Context, req
now := metav1.Now()
var res, message *string
switch cause {
case notFound:
case serviceNotFound:
res = getStringPointer("not_found_service")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change to "service_not_found".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

res = getStringPointer("not_found_service")
message = getStringPointer("the Service does not exist")
case importedService:
case serviceWithoutEndpoints:
res = getStringPointer("no_endpoints_service")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"no_endpoints" or "service_without_endpoints"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "service_without_endpoints"

@@ -138,15 +141,15 @@ func (r *ServiceExportReconciler) Reconcile(ctx context.Context, req ctrl.Reques
epResExportName := getResourceExportName(r.localClusterID, req, "endpoints")

cleanup := func() error {
err = r.handleServiceDeleteEvent(ctx, req, commonArea)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some comments to explain why we do not check svcInstalled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added
I leave it to caller to check the svcInstalled. In some rare cases, it might be better to call clean up anyway.

@@ -172,29 +177,54 @@ func (r *ServiceExportReconciler) Reconcile(ctx context.Context, req ctrl.Reques
if err := cleanup(); err != nil {
return ctrl.Result{}, err
}
err = r.updateSvcExportStatus(ctx, req, notFound)
err = r.updateSvcExportStatus(ctx, req, serviceNotFound)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can update Status repeatedly if there are other Service/Endpoint changes happen, even they do not impact the Status value?

Maybe not a big deal, or we should save Status to installedSvcs? The same apply to cleanup() - should we keep the Service in installedSvcs, as long it has been processed, so we do not call cleanup() multiple times unnecessarily?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this part unchanged considering there is no harm to update the status since we already skip watching ServiceExport's status change event, but I added a ResourceVersionChangedPredicate for Service and Endpoint to limit the mapping events. It will only get notified when Service or Endpoints resource version changes.
When the ServiceExport is not found, it will check the 'svcInstalled' before call cleanup, but for Service and Endpoint, I left it to call cleanup() anyway considering most events will be skipped if there is no existing ServiceExport.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no big "harm", but it is unnecessary and in some cases may look strange that we keep updating Status for no reason or keep deleting non-existing resources (will it generate strange logs by kube-apiserver or our code?).

Anyway, at least we need not to fix it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a step to compare status difference before update the ServiceExport, so it can help to avoid unnecessary update.

@luolanzone luolanzone force-pushed the handle-service-with-empty-endpoints branch from bafbd66 to 78cfd11 Compare July 28, 2022 02:31
@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

@@ -138,15 +141,17 @@ func (r *ServiceExportReconciler) Reconcile(ctx context.Context, req ctrl.Reques
epResExportName := getResourceExportName(r.localClusterID, req, "endpoints")

cleanup := func() error {
// There is no side effect to call deletion directly, so leave it to the caller
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not the reason. The reason is when controller restarts, the Service is not in cache, but it is still possible we need to remove ResourceExports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -172,29 +177,54 @@ func (r *ServiceExportReconciler) Reconcile(ctx context.Context, req ctrl.Reques
if err := cleanup(); err != nil {
return ctrl.Result{}, err
}
err = r.updateSvcExportStatus(ctx, req, notFound)
err = r.updateSvcExportStatus(ctx, req, serviceNotFound)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no big "harm", but it is unnecessary and in some cases may look strange that we keep updating Status for no reason or keep deleting non-existing resources (will it generate strange logs by kube-apiserver or our code?).

Anyway, at least we need not to fix it in this PR.

@luolanzone luolanzone force-pushed the handle-service-with-empty-endpoints branch from 78cfd11 to 9b27aa1 Compare July 28, 2022 08:44
@tnqn tnqn added this to the Antrea v1.8 release milestone Jul 28, 2022

if existingCondition != (k8smcsv1alpha1.ServiceExportCondition{}) {
if *existingCondition.Reason == *newCondition.Reason {
// No need to update the ServiceExport when there is no status change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As existingCondition is from cached client, this check may not be safe? But maybe not a big deal for status update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me why we need to get and check existing ResourceExport in updateOrCreateResourceExport()? It may get stale state from client cache too right? We should know it is update or not, from installedSvcs? Just in the restart case, we do not know as installedSvcs is empty, but we can check if ResourceExport exits or not, only for the Create case to cover that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new updated ResourceExport will be set to use the same ResourceVersion from existing ResourceExport, if the cache is out of date, controller will fail to update the ResourceExport and retry.
We check the stored installed Service info to compare the information to determine if it needs to update or just skip updating ResourceExport. If controller restart, the update action will happen once without comparing due to emtpy installedSvcs.

Copy link
Contributor

@jianjuns jianjuns Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw ResourceVersion, but still I am asking is it necessary to get ResourceExport and compare, or in most cases we can just decide based on installedSvcs, but only when installedSvcs does not contain the Service we get ResourceExport?

If the Service is saved in installedSvcs, we know for sure an update is required, right? In that case, can we just update ResourceExport without get it first (and we will not get update failure due to stale cache/ResourceVersion)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We indeed compare installedSvcs instead of ResourceExport, the ResourceVersion is required when there is an update. It means we either store it in a cache or get it before update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel saving ResourceVersion to avoid possible update failure sounds better, but I am ok to keep the current approach too.

Refine ServiceExport controller to watch Endpoints events to
ensure that Service kind of ResourceExport can be removed
when the exported Service no longer has available Endpoints,
or skip writing ResourceExport if there is no Endpoints at the beginning.

Signed-off-by: Lan Luo <luola@vmware.com>
@luolanzone luolanzone force-pushed the handle-service-with-empty-endpoints branch from 9b27aa1 to 2fd6842 Compare August 1, 2022 01:33
@jianjuns
Copy link
Contributor

jianjuns commented Aug 1, 2022

/test-multicluster-e2e
/skip-all

@jianjuns
Copy link
Contributor

jianjuns commented Aug 1, 2022

/test-integration

@jianjuns
Copy link
Contributor

jianjuns commented Aug 2, 2022

/test-integration

@jianjuns jianjuns merged commit 98967d7 into antrea-io:main Aug 2, 2022
@luolanzone luolanzone deleted the handle-service-with-empty-endpoints branch August 2, 2022 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/multi-cluster Issues or PRs related to multi cluster.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove Service kind of ResourceExport when no available Pod Endpoints
3 participants