Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Prow support multiple build clusters. #5848

Closed
cjwagner opened this issue Dec 6, 2017 · 16 comments
Closed

Make Prow support multiple build clusters. #5848

cjwagner opened this issue Dec 6, 2017 · 16 comments
Assignees
Labels
area/prow Issues or PRs related to prow kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@cjwagner
Copy link
Member

cjwagner commented Dec 6, 2017

If we refactor Prow to support multiple build clusters we can run specific jobs in a trusted cluster where they can be given credentials that are too privileged to host on the current build cluster.

This could potentially be achieved by using multiple plank controllers that partition the jobs they handle with label selectors. One controller could be used for each separate build cluster.
I think it is probably cleaner to just allow jobs to specify a build cluster alias in their config if they require a cluster besides the default and change the build-cluster config to be a map from cluster aliases to cluster credentials.

@cjwagner
Copy link
Member Author

cjwagner commented Dec 6, 2017

/area prow
/kind feature

@k8s-ci-robot k8s-ci-robot added area/prow Issues or PRs related to prow kind/feature Categorizes issue or PR as related to a new feature. labels Dec 6, 2017
@spxtr
Copy link
Contributor

spxtr commented Dec 6, 2017

Which specific jobs do you have in mind that need such credentials?

@cjwagner
Copy link
Member Author

cjwagner commented Dec 6, 2017

The main use case is for jobs that we want to implement as trusted periodic jobs, but which currently have to implemented as k8s CronJobs. The label_sync tool is an example. The inactive_review_handler and needs-rebase mungers could both be rewritten as trusted periodic jobs as well.

Another use case would be adapting the golint plugin to run automatically as a trusted presubmit that has Github oauth credentials needed to review PRs.

@BenTheElder
Copy link
Member

There's also been some consideration of automatically publishing images for eg test-infra as a post-submit.

@0xmichalis
Copy link
Contributor

@munnerz also wanted to do something similar. It's possible to do it today with plank and label selectors. The only issue that we identified was that deck does not support multiple build cluster configs.

@BenTheElder
Copy link
Member

/cc @perotinus who could probably take advantage of this to improve security around nightly image push.

@spxtr
Copy link
Contributor

spxtr commented Dec 9, 2017

While there are valid cases for a separate trusted build cluster, I want to make clear that certain types of programs should not be run as prowjobs. As a general rule, anything that acts as k8s-ci-robot on GitHub should probably not be running as a prowjob.

The golint plugin can be turned into an external plugin running in the main prow cluster. We can add pylint, shellcheck, and so on external plugins as well, or put them all into one lint plugin. We already have infrastructure in place that is a perfect fit for these, so someone just needs to write the plugins.

The label_sync tool, inactive_review_handler, and needs-rebase mungers should be k8s cronjobs or deployments running a sync loop. I don't see anything wrong with that, especially now that we have rudimentary monitoring stuff set up.

That all said, the consideration of automatically publishing artifacts to a trusted location as a postsubmit or a nightly thing does make sense. That is a valid use for a trusted cluster.

This could potentially be achieved by using multiple plank controllers that partition the jobs they handle with label selectors. One controller could be used for each separate build cluster.
I think it is probably cleaner to just allow jobs to specify a build cluster alias in their config if they require a cluster besides the default and change the build-cluster config to be a map from cluster aliases to cluster credentials.

Agreed that it's better to not use plank label selectors for this.

@cjwagner
Copy link
Member Author

@spxtr Why do you consider automatically publishing artifacts to a trusted location a valid use of a trusted cluster, but not things like the inactive_review_handler or needs-rebase munger? They are both periodic jobs that needs privileged credentials.

I think the downside to using k8s cronjobs instead of periodic jobs is the extra management effort to redeploy them whenever they are updated. If we implemented the proposed additions to the update-config plugin (https://docs.google.com/document/d/1Gmt4wq-QC9gOGvR6P4ls7uMRyoQwzzOuTlKhxHqM8uo/edit#heading=h.x9snb54sjlu9) we wouldn't have to manually redeploy cronjobs.

@spxtr
Copy link
Contributor

spxtr commented Dec 12, 2017

Why do you consider automatically publishing artifacts to a trusted location a valid use of a trusted cluster, but not things like the inactive_review_handler or needs-rebase munger? They are both periodic jobs that needs privileged credentials.

Prow components should all live in the same place and look the same. They should have the same deployment, logging, and monitoring. Automatically publishing artifacts isn't a prow component. It's a general CI/CD step that most people using Jenkins or buildbot or whatever will end up setting up. It makes total sense to have the nightly build, test, and deploy jobs be prow jobs.

I think the downside to using k8s cronjobs instead of periodic jobs is the extra management effort to redeploy them whenever they are updated.

This is a general downside to all of prow development. This problem is being worked on by mattmoor, fejta et al, and these cronjobs will piggyback on that work.

@BenTheElder
Copy link
Member

BenTheElder commented Dec 12, 2017

Prow components should all live in the same place and look the same. They should have the same deployment, logging, and monitoring. Automatically publishing artifacts isn't a prow component. It's a general CI/CD step that most people using Jenkins or buildbot or whatever will end up setting up. It makes total sense to have the nightly build, test, and deploy jobs be prow jobs.

This sounds right to me, but I do want to mention that I don't think we should stress "sticking to kubernetes" too strongly... we often need to work around things in Kubernetes that aren't solved yet. We can provide Kubernetes (and hopefully others!) with good infrastructure without 100% dog-fooding.

Also, to be fair:

  • Technically speaking Prow periodic jobs are a bit more flexible than Kubernetes's cron jobs and we're freer to experiment with how they behave.
  • Kubernetes uses the same cron library as far as I can tell (go has one solid one) and we could very well provide useful feedback from horologium. For example @krzyzacy and I have discussed some interesting ideas around improving the resiliency of cron jobs.
  • Kube cron jobs are also still quite simple and have some real limitations that are unresolved (not to mention that other features which we can trivially implement such as time zone awareness are still being worked out for k8s).

That said:

This is a general downside to all of prow development. This problem is being worked on by mattmoor, fejta et al, and these cronjobs will piggyback on that work.

This 1000x. We should reach a point where we regularly deploy a rules_k8s bundle representing the entire cluster (modulo secrets). Perhaps a cron job does this every morning PST (disabled during code freeze)? In our current state we've often selectively updated some components while letting other deployments get quite stale without noticing.

If we get to this point it won't really matter if we're pushing a prow config change or applying an updated k8s cron yaml.


Side note: Another fairly valid use-case is that the security repo needs restricted access credentials, and it would be nice to not need to maintain two Prow deployments for this. We could combine this with #5806?


Edit: I also 👍'd your comment directly above @spxtr, I just think these things are more generally worth mentioning as broader points, not too terribly specific to this issue.

@BenTheElder BenTheElder added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Dec 12, 2017
@fejta
Copy link
Contributor

fejta commented Dec 16, 2017

Honestly I'd like to move whatever we can out of plugins and into prow jobs. I do not see a good reason for things like the lint plugin to be a plugin versus a job. The latter makes more sense and keeps the core part of prow smaller.

@fejta
Copy link
Contributor

fejta commented Dec 16, 2017

Also has anyone started a design doc for how I will tell which cluster to schedule a job in, as well as how to inform prow which clusters are possible?

@cjwagner
Copy link
Member Author

has anyone started a design doc for how I will tell which cluster to schedule a job in, as well as how to inform prow which clusters are possible?

I'll make a doc, thats the next thing I want to get done so that we can get the security repo fixed.

@spxtr
Copy link
Contributor

spxtr commented Dec 17, 2017

Honestly I'd like to move whatever we can out of plugins and into prow jobs. I do not see a good reason for things like the lint plugin to be a plugin versus a job. The latter makes more sense and keeps the core part of prow smaller.

If we want to go all the way in that direction then we can get rid of the plugin architecture entirely and have the only "plugin" be that which triggers prowjobs. This is a reasonable idea, but I'm not sure it's worth it now that we have external plugins. We can already pull the lint plugin out of the hook binary and into its own deployment. Then we can add pylint and friends to the lint image without mixing it up with the hook image.

@BenTheElder
Copy link
Member

If we want to go all the way in that direction then we can get rid of the plugin architecture entirely and have the only "plugin" be that which triggers prowjobs. This is a reasonable idea, but I'm not sure it's worth it now that we have external plugins. We can already pull the lint plugin out of the hook binary and into its own deployment. Then we can add pylint and friends to the lint image without mixing it up with the hook image.

I do like the idea of not putting linters in the image, but surely this only makes sense for a certain class of plugins, lots of them are very small evented things?

@cjwagner
Copy link
Member Author

This was implemented in #6053

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prow Issues or PRs related to prow kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

6 participants