Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup and update releng/test_config.yaml #32340

Open
17 tasks
xmudrii opened this issue Mar 29, 2024 · 14 comments
Open
17 tasks

Cleanup and update releng/test_config.yaml #32340

xmudrii opened this issue Mar 29, 2024 · 14 comments
Assignees
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/release Categorizes an issue or PR as relevant to SIG Release.

Comments

@xmudrii
Copy link
Member

xmudrii commented Mar 29, 2024

releng/test_config.yaml contains definitions for kubernetes/kubernetes tests (mainly periodics). We use those definitions to generate jobs for release branches when cutting a new release branch.

Over the time, this file has become unmaintained which is a huge risk for having jobs that are not properly testing release branches.

Here are some problems that I observed; this is a non exhaustive list, in other words, there might be other issues that I didn't observe:

  • .jobs
    • ci-kubernetes-e2e-gce-cos-.+?-default is inconsistent
      • Different release branches has different args (to confirm if intentional)
        • Only ci-kubernetes-e2e-gce-cos-k8sbeta-default has --env=ENABLE_CACHE_MUTATION_DETECTOR=true arg
        • Only ci-kubernetes-e2e-gce-cos-k8sstable1-default has --env=ENABLE_POD_SECURITY_POLICY=true
        • Other ci-kubernetes-e2e-gce-cos-.+?-default jobs have no (additional args)
      • k8sbeta and k8sstable1 jobs are missing testgridNumFailuresToAlert parameter
    • We're missing betaapis tests for k8sbeta, k8sstable1, and k8sstable2 releases
      • Verify if args for betaapis tests are still valid
  • .images
    • cos1, cos2, ubuntu1, and ubuntu2 images are not present in jobs names and therefore are likely unused (check if it is safe to remove these images)
  • .testSuites
    • flaky, soak, stackdriver, updown, and nosat suites are not present in jobs names and therefore are likely unused (check if it is safe to remove these images)
  • .nodeImages
    • cos1, cos2, ubuntu1, and ubuntu2 images are not present in jobs names and therefore are likely unused (check if it is safe to remove these images)
  • .nodeTestSuites
    • gkespec and serial are likely unused

Once these issues are addressed, we should verify if arguments are up to date for remaining test options.

cc @kubernetes/release-engineering @dims

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 29, 2024
@xmudrii
Copy link
Member Author

xmudrii commented Mar 29, 2024

/sig release
/area release-eng

@k8s-ci-robot k8s-ci-robot added sig/release Categorizes an issue or PR as relevant to SIG Release. area/release-eng Issues or PRs related to the Release Engineering subproject and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 29, 2024
@xmudrii
Copy link
Member Author

xmudrii commented Mar 29, 2024

/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Mar 29, 2024
@dims
Copy link
Member

dims commented Apr 2, 2024

/milestone v1.31

@k8s-ci-robot
Copy link
Contributor

@dims: The provided milestone is not valid for this repository. Milestones in this repository: [someday, v1.24, v1.25]

Use /milestone clear to clear the milestone.

In response to this:

/milestone v1.31

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 1, 2024
@xmudrii
Copy link
Member Author

xmudrii commented Jul 23, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 23, 2024
@xmudrii
Copy link
Member Author

xmudrii commented Jul 31, 2024

We're running into this issue again ahead of the 1.31 jobs cut. These are the high priority issues at the moment:

/priority critical-urgent
cc @dims @mehabhalodiya

@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jul 31, 2024
@dims
Copy link
Member

dims commented Jul 31, 2024

ci-kubernetes-e2e-gce-cos-k8sbeta-default (1.30) has --env=ENABLE_CACHE_MUTATION_DETECTOR=true, do we need to keep this for 1.31 as well?

Yes, we should keep it. helps catch the scenario where anything mutates a shared informer cache (see description here)

ci-kubernetes-e2e-gce-cos-k8sbeta-alphafeatures (1.30) uses a custom testSuite called alphafeatures-ccm-eventedpleg, do we need to keep it for 1.31 as well?

Yes, please. will help give us signal on an alpha feature https://github.com/kubernetes/kubernetes/blob/9413cf204ac92711cc8aff472b1ed11ba79760ac/pkg/features/kube_features.go#L261-L267

ci-kubernetes-e2e-gce-cos-k8sstable1-default (1.29) has --env=ENABLE_POD_SECURITY_POLICY=true, however, this env is present only for 1.29. Do we need it for any other version or is it specific to only 1.29?

We can drop it, does not do much AFAICT looking through references - https://cs.k8s.io/?q=ENABLE_POD_SECURITY_POLICY%7CEnablePodSecurityPolicy&i=nope&files=&excludeFiles=vendor&repos=

@BenTheElder
Copy link
Member

Can we please just use config fork annotations on normal prowjobs and stop using this other thing entirely?

I don't think SIG release should be defining these jobs, just forking them when new branches are cut, and I really doubt anyone else in the project is looking at this file. It's hard to even understand what these config fields are doing.

cc @liggitt @tallclair re: PSP

yes we want to keep the cache mutation detector enabled.

@liggitt
Copy link
Member

liggitt commented Aug 1, 2024

ENABLE_POD_SECURITY_POLICY is irrelevant in 1.25+ and can be dropped

@BenTheElder
Copy link
Member

documented and widely used:

https://github.com/kubernetes/test-infra/tree/master/releng/config-forker#supported-annotations

Versus ~no docs for generate_tests and the out of band config.

If the config fork annotations are not powerful enough for some reason, let's fix that.

@xmudrii
Copy link
Member Author

xmudrii commented Aug 1, 2024

Can we please just use config fork annotations on normal prowjobs and stop using this other thing entirely?

Long term, we can look into it. But short term and specially for this upcoming release, definitely no. We're happy to look into it, but this will require some time and we want to get the release branch up and running ASAP.

@BenTheElder
Copy link
Member

I'm pretty sure https://kubernetes.slack.com/archives/CN0K3TE2C/p1722833073336809 is related

The problem is that job maintainers are going to PR the job definitions and expect auto-forking, and are not aware that this out of band config needs changing as well, since it is only used once per release branch.

I think we really need to switch for 1.32 or we'll keep having regressions.

cc @pohly @aojea, @kubernetes/release-engineering

@xmudrii
Copy link
Member Author

xmudrii commented Aug 6, 2024

/assign
Let met look into this and if we can quickly drop this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

No branches or pull requests

6 participants