Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(webhook-receivers.md): responsiveness #612

Merged
merged 2 commits into from
Nov 10, 2021

Conversation

lloydchang
Copy link
Contributor

@lloydchang lloydchang commented Oct 31, 2021

  1. responsiveness
  2. fix broken links

@lloydchang
Copy link
Contributor Author

@resouer @wonderflow @christianh814 Please review this #612 when you time permits.

This relates to our conversation in #wg-gitops at https://cloud-native.slack.com/archives/C01G9DEE88M/p1635706634045400?thread_ts=1635251844.014900

Thank you! 🙂

@kingdonb
Copy link
Member

kingdonb commented Oct 31, 2021

This is an incorrect change, in my opinion.

Webhook receivers do not alter Flux's behavior so that it is "push-based" instead of "pull-based" and therefore they are not "counter to GitOps principles" in any way – a webhook receiver only receives events, and it only triggers the same pull-based reconciliation that is considered as central to GitOps. It would be different if the webhook pushed a specific commit hash, or any out-of-band instructions or information at all that were not fully encapsulated in a GitRepository.

One is absolutely still "doing proper GitOps" with any receiving webhooks enabled, as this is done in order to shorten the control loop. The only procedural difference with receivers enabled is that reconcilers no longer need to poll in order to achieve a fast inner-loop and to ensure that changes from the repo are applied within a reasonable amount of time.

With Receivers configured, the "sync" event can no longer depend on spec.interval; instead the request to sync reaches Flux through the webhook. It is the exact same control loop as before, but with receivers we have incorporated a better timing source that obviates the need to set a short interval and poll frequently.

@christianh814
Copy link

This is something that was being discussed by the chairs @scottrigby @todaywasawesome @murillodigital (weak GitOps vs strong GitOps)

IMHO, doing a webhook to shorten the wait time between reconciliation is fine, and can exist within the principles of GitOps.

As long as the reconcile isn't turned off, having web hooks trigger the reconciler earlier is fine.

@lloydchang
Copy link
Contributor Author

Hi @kingdonb, @yebyen, @christianh814, @scottrigby, @todaywasawesome, @murillodigital, @resouer, @wonderflow,

TL;DR:

  • I appreciated your time and feedback 🙂
  • GitOps, without any pushes at all, is designed to be secure.
  • Webhooks, by introducing pushes, compromise our GitOps security posture.
  • Push-based pipelines and event-driven webhooks in event-driven architecture that trigger an automated pull are different from GitOps Principle 3. Pulled Automatically.
  • While the two English words are spelled similarly, there is a significant degree of intentional differences between automated and automatic.

To avoid groupthink in consensus decision-making, I am elaborating on why I created this pull request:


@kingdonb, @yebyen, @christianh814, I re-read your feedback, ...1412 and ...3011, several times.

I empathize and understand that it may be accidental, or maybe even easier, to conflate GitOps Principles 3, 4, 5, holistically as one set. Let's not conflate them in this pull request. Thank you! 🙂

I would like to double-check whether you are conflating, or not conflating, GitOps Principles 3 and 4.


GitOps Principle 3 says "automatically" and not "automated". Since it is a principle, I interpret "Pulled Automatically" to mean fully automatic as a target operating model.

  1. Pulled Automatically
    Software agents automatically pull the desired state declarations from the source.

In fully automatic, there are no intentional events at all — There wouldn't be webhooks driving push-based events intentionally from server-to-server.

  • GitOps, without any pushes at all, is designed to be secure.

  • Webhooks, by introducing pushes, compromise our GitOps security posture.

Webhooks compromise security posture by introducing new risks that need to be assessed. For example:

  • When webhooks push intentionally at high rates, this situation introduces security risks of incoming events needing to be rate limited at the ingress (i.e. ingress-nginx, or load balancer (LB), or web application firewall (WAF), etc.) to prevent denial of service (DOS) attacks.

In the context of Flux CD version 2, rate limiting and security risks were discussed at fluxcd/notification-controller#180 and fluxcd/notification-controller#193


While event-driven webhooks are intentional and automated, they aren't automatic in the context of GitOps.

  • In a happy path, webhooks at intentional low rates trigger Kubernetes controllers to poll earlier.

  • In a threat model, webhooks at intentional high rates could overwhelm Kubernetes controllers with webhook receivers.

Pushed-based pipelines lack the degree of security designed in GitOps.

The intent in webhooks makes push-based pipelines automated with different outcomes between happy path and threat model, thus not GitOps.

The intent in webhooks contrasts with GitOps Principle 3. Pulled Automatically.


  • While the two English words are spelled similarly, there is a significant degree of intentional differences between automated and automatic.

automated
/ˈôdəˌmādəd/


au·to·mat·ic
/ˌôdəˈmadik/


  • A person who manually triggered a webhook, then the polling earlier, then subsequent pulling, triggered an automated process — This is different from GitOps Principle 3. Pulled Automatically.

  • This makes pushed-based pipelines an automated process, but not a fully automatic process.

To play devil's advocate, one can split hairs and say it's semi-automated or semi-automatic within the spectrum:

  • manually
  • semi-automated
  • automated
  • semi-automatic
  • automatic
  • fully automatic
  • autonomic

This pull request is about GitOps Principle 3: Pulled Automatically.


Whereas GitOps Principle 4 is about reconciliation.

  1. Continuously Reconciled
    Software agents continuously observe actual system state and attempt to apply the desired state.

@christianh814 and @yebyen — As for weak GitOps or strong GitOps, I introduced that terminology to @todaywasawesome and @scottrigby, with documented examples at open-gitops/documents#24 (comment). In that context, it is about proposed GitOps Principle 5.

  1. Operated in a closed loop
    Software agents observe desired state and meta-state to take actions based on policy when the desired state cannot be reached. This may include things like notifications, rollbacks, etc.

Separately from this pull request, I feel that autonomic is more applicable to GitOps Principle 5. Operated in a closed loop, also known as strong GitOps.


au·to·nom·ic
/ˌôdəˈnämik/


@christianh814 wrote:

As long as the reconcile isn't turned off, having web hooks trigger the reconciler earlier is fine.

The reconcile can be turned off, e.g. flux suspend ...

❔ Curiously, how does this discussion about Flux compare and contrast with Carvel, kpt, and Argo CD: Webhook notification service?

@christianh814 wrote in #wg-gitops:

I haven't seen carvel but kpt seems event driven so it might not fit. I would have to do some testing around it.

Example: I can turn off reconciliation on Argo CD and use web hooks to do the sync.
^ this isn't GitOps (although a valid workflow).


The push-based pipelines range from intentional event-driven webhooks that need rate-limiting to mitigate security risks to CIOps & CI Ops push-based pipelines.

How secure are the typical CI & CD pipelines?
Let’s see how we can improve the typical CI & CD pipelines with GitOps.

@scottrigby, @todaywasawesome and I had discussed push-based Continuous Integration Operations (CIOps & CI Ops) at open-gitops/documents#8


The following automobile analogy, cruise control, is an homage to CruiseControl open-source software from 2001 (20 years ago). CruiseControl was one of the first of its kind of open-source software which paved a path for continuous integration.


Automated:
🚗 🦶 🔘 👈

Semi-automatic:
🚗 🚙

Fully-automatic:
🏎️ 🏍️ 💨 🛻 🛺 🚘 💥 🚌 🚷 🚶 🚷 🚌 ⚠️ 🚗 🚙

Please let me know if you feel like we are splitting hairs between:
• automated
• semi-automatic
• fully automatic
🙂


Thank you @kingdonb, @yebyen, @christianh814, @scottrigby, @todaywasawesome, @murillodigital, @resouer, @wonderflow for your time and feedback 🙂

@kingdonb
Copy link
Member

kingdonb commented Nov 1, 2021 via email

@lloydchang
Copy link
Contributor Author

lloydchang commented Nov 1, 2021

@kingdonb wrote:

capability of webhooks to verify signatures

❔ Would you have a concrete example? Thank you.

❔ Are you referring to HMAC signature at fluxcd/notification-controller#127 from January?

The above questions are to avoid over-promising and under-delivering.

It might be an unintentional security oversight and over-promising to claim that webhooks signatures are validated without further clarifying that payloads aren't validated.

After reading a report by @aliusmiles in October at fluxcd/notification-controller#256

generic/hmac receivers and found out that payload actually doesn't matter at all for them.

it's worth updating docs on HMAC webhook to add clear statement that payload content is not validated, only signature is verified?

singing [sic] empty string is not a good idea and any random text must be used

My understanding is that payload validation isn't happening at all, which makes sense to me because of the complexity to validate payloads.

In April, when I reviewed fluxcd/notification-controller#180 (comment), I wrote:

My understanding is the applications listed at https://toolkit.fluxcd.io/components/notification/controller/ have different webhook formats. You wouldn't find a standard payload amongst them.

Due to the complexity to validate payloads, fluxcd/notification-controller#180 was closed as Won't Do in May.

It is okay to not validate payloads because of the complexity. The corollary is to be clearer in the documentation
at https://fluxcd.io/docs/components/notification/receiver
and https://github.com/fluxcd/website/blob/main/content/en/docs/guides/webhook-receivers.md
that the payload isn't validated at all.

In particular, the sentence

Notification controller validates the authenticity of the payload using HMAC

is misleading.

In February, perhaps @dholbach unintentionally added that sentence without fully realizing that the payload isn't validated at all.

Thank you, everyone 🙂

@yebyen
Copy link
Contributor

yebyen commented Nov 1, 2021

payload actually doesn't matter at all for them

The reason that it's safe to not validate payloads, is because the payload is actually not used for anything, across all of the Webhook receiver types. (Not only because it would be too difficult)

The Webhook only has one message, which makes the API surface very small and easy to verify its safety. The content does not affect the function. It's like a method that takes zero parameters, the content is discarded in toto. The content is never processed for anything, with the possible exception of selecting which event (push) can trigger the webhook to fire (part of the security posture also.)

The example is every Receiver in the pack, with the exception of only generic type, to account for services like BitBucket Cloud which do not offer webhook secret or signing. The URL itself is randomly generated based on the content of the Receiver and the secret token though.

So even for BitBucket Cloud (which intentionally doesn't permit signing requests, to leave it as an advantage of the paid BitBucket Server?) the webhook can be protected as long as the URL itself is kept secret. Other Receiver types are based on the hmac signing pattern, I haven't read them all but my naive understanding is they all work this way.

@yebyen
Copy link
Contributor

yebyen commented Nov 1, 2021

I wasn't aware that payloads are not validated at all, thanks for adding this @lloydchang
(It might be important for cryptographic purposes, even if the payload is not used for anything; having a payload that varies from one request to the next, could be important for ensuring that the cryptographic integrity of the message is not compromised, and that replay attacks are impossible. I don't know enough about the crypto to say one way or another.)

@stefanprodan
Copy link
Member

stefanprodan commented Nov 1, 2021

The way we designed the webhook integrations doesn't make Flux less GitOpsy. The webhooks don't tell Flux what to pull or from where, Flux doesn't look at the webhook payload. The sole scope of the Flux Receiver API is to request a reconciliation outside of the specified interval, similar to flux reconcile command.

@squaremo
Copy link
Member

squaremo commented Nov 1, 2021

It does say

push-based [GitOps] pipelines

so whether or not you agree that it's doctrinally OK to have webhooks, the text is inaccurate in the first place (not to mention: "GitOps pipeline" is not defined anywhere, is it?). I would just remove that whole sentence.

@kingdonb
Copy link
Member

kingdonb commented Nov 1, 2021

I agree with @squaremo except that:

Flux is by design pull-based. In order to notify the Flux controllers about changes in Git or Helm repositories,
you can setup webhooks and trigger a cluster reconciliation every time a source changes. Using webhook receivers, you can build push-based GitOps pipelines that react to external events.

there was clearly meant to be a "compare" / "contrast" flow to this paragraph. Removing this whole sentence will break that sentence structure and compromise the flow, (leaving a weird intro paragraph that has only ebb left, and no flow.)

Something like this maybe?

every time a source changes. Using webhook receivers, you can build responsive and reactive workflows that appear to behave like push-based pipelines, with a fast inner dev loop experience, and without need for aggressive polling of Git sources.

@squaremo
Copy link
Member

squaremo commented Nov 1, 2021

that appear to behave like push-based pipelines

push-based means the control belongs to the pushing process; e.g., a CI system that tells the cluster to sync, then waits to see if it succeeds. That isn't what is happening here -- the control belongs to the pulling process. On that basis, it's better to say that using webhooks makes it more responsive, and leave it at that.

@kingdonb
Copy link
Member

kingdonb commented Nov 8, 2021

every time a source changes. Using webhook receivers, you can build fast and reactive workflows that happen as quickly as possible, and compete with push-based pipelines in terms of responsiveness to changes, with a fast inner dev loop experience, and without need for aggressive polling of Git sources.

I think we still want to use the words push-based here, if nothing else then for SEO reasons and to draw the attention of people that still think they want something push-based. This version does not apply the descriptor "GitOps" to the category push-based anymore, so I think it satisfies the original concern of the report/PR. It is probably still 50% more words than we want in this paragraph, but captures all of the meaning I think we wanted to convey. Would be happy if there was a shorter way to say as much, (writing terse prose is not my personal strongest point.)

I am fine with anything that reads like this. Can you amend the language of your PR @lloydchang or we can close it and resolve this with a separate PR 👍

@scottrigby scottrigby added the question Further information is requested label Nov 8, 2021
Copy link
Member

@scottrigby scottrigby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm late to the party! Thanks, @lloydchang for raising the question, I think this has been a very interesting discussion. And thanks everyone for participating! I'm guessing there will be more conversations like this for different functionality, configurations, and documentation of Flux and other tools in the GitOps ecosystem.

@lloydchang To move this forward, would you please update this PR wording according to maintainer suggestions, or else close this PR if you don't have time to do that? I believe a few options are given above.

@squaremo I agree with you this is different from traditional CI and/or CD push-based systems for reasons you mentioned. However, I believe @stefanprodan added that wording to make clear the need has been met for end-users who don't want to wait until the next scheduled interval whenever they purposefully make a source change. They are often looking for "push-based" support. We offer this, just architected in a GitOps way.

We understand that while the webhook receiver accepts a "push" message, that's only to let it know it's time to do another pull immediately. We could call this a "poke" informally since it's done differently than the less secure traditional CI & CD "push-only" models 😛 It's just architected to conform to GitOps principles. But I think "push" is accurate here.

End users can read more about this in the docs, but often users are looking for a "push-based" CD because they want this exact behavior. This poke method is literally still a "push" event. Once they learn it's equally fast, more secure, and adheres to GitOps principles too, that's added value for them, not false advertising IMO. Defer to you and the other Flux oversight committee members to decide on whether or not to keep push-based wording in some way. My 2¢ is that there's still value in using the term here for those end-users looking for this, and in terms of accuracy if anything it's chance to educate folks on GitOps while making clear Flux satisfies their legitimate need.

Signed-off-by: lloydchang <lloydchang@gmail.com>
@lloydchang lloydchang changed the title docs(website/content/en/docs/guides/webhook-receivers.md): remove GitOps from push-based pipelines docs(webhook-receivers.md): responsiveness Nov 9, 2021
@lloydchang
Copy link
Contributor Author

amended

@kingdonb
Copy link
Member

kingdonb commented Nov 9, 2021

I think I can approve this, @scottrigby WDYT

@kingdonb kingdonb self-requested a review November 9, 2021 14:35
Copy link
Member

@kingdonb kingdonb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kingdonb
Copy link
Member

kingdonb commented Nov 9, 2021

I re-ran the tests but I am not able to see why lychee link checker test is failing right now.

fix https://github.com/fluxcd/website/runs/4156089344

fix 2 errors reported by lychee Link Checker:
Errors in content/en/docs/guides/webhook-receivers.md
✗ file:///github/workspace/content/en/docs/installation (Cannot find file file:///github/workspace/content/en/docs/installation)
✗ file:///github/workspace/content/en/docs/get-started/index.md (Cannot find file file:///github/workspace/content/en/docs/get-started/index.md)

Signed-off-by: lloydchang <lloydchang@gmail.com>
@kingdonb
Copy link
Member

kingdonb commented Nov 9, 2021

Looks like that did it, thank you @lloydchang

Please approve if you agree @scottrigby then this can be merged 👍

@lloydchang
Copy link
Contributor Author

@kingdonb please re-review. To fix (pre-existing) broken links to pass lychee Link Checker, I added another Git commit. Thank you.

Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @lloydchang

@stefanprodan stefanprodan dismissed scottrigby’s stale review November 10, 2021 07:36

The raised issues were addressed

@stefanprodan stefanprodan merged commit 2d80ca4 into fluxcd:main Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants