Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(trafficrouting): Fix downtime on initial deployment using Istio DestinationRule Subsets. Fixes #2507 #3602

Merged

Conversation

wmuizelaar
Copy link
Contributor

@wmuizelaar wmuizelaar commented May 28, 2024

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this is a chore.
  • The title of the PR is (a) conventional with a list of types and scopes found here, (b) states what changed, and (c) suffixes the related issues number. E.g. "fix(controller): Updates such and such. Fixes #1234".
  • I've signed my commits with DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My builds are green. Try syncing with master if they are not.
  • My organization is added to USERS.md.

Fixes #2507

@wmuizelaar wmuizelaar changed the title fix(istio): Fix downtime on initial deployment using Istio DestinationRule Subsets. Fixes #2507 fix(trafficrouting): Fix downtime on initial deployment using Istio DestinationRule Subsets. Fixes #2507 May 28, 2024
@wmuizelaar wmuizelaar force-pushed the fix_istio_desination_rule_readiness branch from bde071e to 25f0dc8 Compare May 28, 2024 10:02
Copy link
Contributor

github-actions bot commented May 28, 2024

Go Published Test Results

2 166 tests   2 166 ✅  2m 55s ⏱️
  119 suites      0 💤
    1 files        0 ❌

Results for commit 582f087.

♻️ This comment has been updated with latest results.

Copy link

codecov bot commented May 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.81%. Comparing base (23e186e) to head (1dcdfcd).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3602      +/-   ##
==========================================
- Coverage   83.87%   83.81%   -0.07%     
==========================================
  Files         162      162              
  Lines       18524    18529       +5     
==========================================
- Hits        15537    15530       -7     
- Misses       2119     2125       +6     
- Partials      868      874       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

github-actions bot commented May 28, 2024

E2E Tests Published Test Results

  4 files    4 suites   3h 28m 0s ⏱️
111 tests 101 ✅  6 💤 4 ❌
448 runs  420 ✅ 24 💤 4 ❌

For more details on these failures, see this check.

Results for commit 582f087.

♻️ This comment has been updated with latest results.

@wmuizelaar wmuizelaar changed the title fix(trafficrouting): Fix downtime on initial deployment using Istio DestinationRule Subsets. Fixes #2507 fix(trafficrouting): Fix downtime on initial deployment using Istio DestinationRule Subsets. Fixes https://github.com/argoproj/argo-rollouts/issues/2507 May 28, 2024
@wmuizelaar wmuizelaar changed the title fix(trafficrouting): Fix downtime on initial deployment using Istio DestinationRule Subsets. Fixes https://github.com/argoproj/argo-rollouts/issues/2507 fix(trafficrouting): Fix downtime on initial deployment using Istio DestinationRule Subsets. Fixes #2507 May 28, 2024
@wmuizelaar wmuizelaar force-pushed the fix_istio_desination_rule_readiness branch 2 times, most recently from 54a2ab8 to 3704456 Compare May 31, 2024 14:07
Copy link
Contributor

@newtondev newtondev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I sat with Wietse on a call and we went through it together.

Copy link

sonarcloud bot commented May 31, 2024

Quality Gate Passed Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
17.0% Duplication on New Code

See analysis details on SonarCloud

@newtondev
Copy link
Contributor

LGTM

@wmuizelaar
Copy link
Contributor Author

@zachaller are you able to take a look at this? It would be much appreciated! 🙏

@zachaller
Copy link
Collaborator

Yea, at first glance I think I would like to try and not change the traffic router interface, this will cause all plugins to have to update etc etc. I think an ok alternative is to pass the replicasetInformer/Lister/k8s client or just add roCtx.allRSs into the istio reconciler context and use that to get the replicasets. If this where to be a plugin they would have to do something similar as well.

@wmuizelaar
Copy link
Contributor Author

Thanks for your remark, I agree that it would be really nice if the interface could stay the same. Let me experiment if I can get it passed to the Istio Reconciler in a different way.

@wmuizelaar
Copy link
Contributor Author

@zachaller can you take a look again please? This feels indeed a lot simpler and easier way of implementation.

@newtondev
Copy link
Contributor

LGTM, working on my test project.

@zachaller zachaller self-assigned this Jul 8, 2024
@Bennett-Lynch
Copy link

@zachaller Gentle bump. Looking to see if this also resolves #3681.

@zachaller
Copy link
Collaborator

Sorry, it's been a while I will take a look at this soon.

Comment on lines +323 to +325
// We need to check if the replicasets are ready here as well if we didn't define any services in the rollout
// See: https://github.com/argoproj/argo-rollouts/issues/2507
if r.rollout.Spec.Strategy.Canary.CanaryService == "" && r.rollout.Spec.Strategy.Canary.StableService == "" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know for sure this is being properly checked when services are defined? In the example in #3681 I am defining both a canary service and stable service but still experiencing a brief downtime/outage after the rollout ends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah, if you explicitly define the services, the existing code should actually be working - the fix I built here is explicitly for when you are using the Istio DestionationRule-method to update your traffic (https://argo-rollouts.readthedocs.io/en/stable/features/traffic-management/istio/#subset-level-traffic-splitting to be precise).

If you define the services (I did not see those in your example, so that's why I thought it might be related to this), there are other codepaths that actually check service-readiness. There might be a big/issue in there, but then your issue is at least definitely NOT related to mine.

@zachaller zachaller added this to the v1.8 milestone Aug 23, 2024
…estinationRule Subsets. Fixes argoproj#2507

Signed-off-by: Wietse Muizelaar <wmuizelaar@bol.com>
@zachaller zachaller force-pushed the fix_istio_desination_rule_readiness branch from 582f087 to 1dcdfcd Compare August 27, 2024 13:54
Copy link

sonarcloud bot commented Aug 27, 2024

Copy link
Contributor

Published E2E Test Results

  4 files    4 suites   3h 22m 20s ⏱️
113 tests 101 ✅  7 💤 5 ❌
458 runs  424 ✅ 28 💤 6 ❌

For more details on these failures, see this check.

Results for commit 1dcdfcd.

Copy link
Contributor

Published Unit Test Results

2 264 tests   2 264 ✅  2m 59s ⏱️
  128 suites      0 💤
    1 files        0 ❌

Results for commit 1dcdfcd.

@zachaller zachaller merged commit 39764fb into argoproj:master Aug 27, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Downtime On initial deployment using Istio destination rule
4 participants