Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 CI]: Jobs are not getting cancelled properly #13483

Closed
titusfortner opened this issue Jan 22, 2024 · 10 comments
Closed

[🐛 CI]: Jobs are not getting cancelled properly #13483

titusfortner opened this issue Jan 22, 2024 · 10 comments
Labels
C-build Build related issues (bazel and CI) I-defect

Comments

@titusfortner
Copy link
Member

titusfortner commented Jan 22, 2024

What happened?

Putting this in an issue since there is a lot of info and I need help. 😄

The original CI code was:

name: CI

on:
  pull_request:
  push:
    branches:
      - trunk
  schedule:
    - cron: "0 */12 * * *"
  workflow_dispatch:

  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.event_name == 'pull_request' }}

For some reason, this job that was manually triggered (with "workflow_dispatch") got the message: Canceling since a higher priority waiting request for 'CI-refs/heads/trunk' exists based on this job which was triggered by a push.

Since I wanted to run everything on trunk at that time, and the new job only ran JS tests based on our Bazel check job, I tried to redo the concurrency section.

The idea was to differentiate the groups between something that tested everything ("workflow_dispatch" and "schedule") from the ones that do not necessarily do that ("pull_request" and "push") by appending the string "-all" for the first two. Doing this required using a "fake ternary". The current code is:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}${{ github.event_name == 'workflow_dispatch' && '-all' || '' }}${{ github.event_name == 'schedule' && '-all' || '' }}
  cancel-in-progress: true

except now I'm running into the issue where this job that tests just the Python tests gets canceled because of this job that tests just the Ruby tests.

Looking at this again, I realize that the cancel-in-progress: ${{ github.event_name == 'pull_request' }} should have prevented the original issue I had, so now I'm a little stuck on what this code should be.

The goal:

  • Cancel jobs that re-run tests that have changed in the same branch with a subsequent job
  • Do not cancel jobs that are running tests that might not get run in a subsequent job

If we can trust bazel to cache properly, then maybe we don't need to worry about running more jobs and we shouldn't cancel things in progress? Or is caching not good enough? (there have been periods of active development where the CI was backed up about 4 hours which led me to believe we were redoing a lot of tests)

@p0deje / @diemol Any ideas?

@titusfortner titusfortner added I-defect C-build Build related issues (bazel and CI) labels Jan 22, 2024
@p0deje
Copy link
Member

p0deje commented Jan 22, 2024

It should be fairly straightforward if we use the following group:

group: ${{ github.workflow }}-${{ github.ref }}-${{ github.actor }}-${{ github.event_name }}

Now if you run a job, it should not affect the builds that are started manually by other people (github.action is responsible for that) and different events (github.event_name).

@titusfortner
Copy link
Member Author

Except if I push a Ruby update to trunk then push a JS update to trunk, the Ruby update will get stopped

And I'd be ok with a test run that tests everything stopping something else testing everything (hence combining dispatch & scheduled). But I guess I'd rather run more tests than fewer tests if it comes down to it.

@p0deje
Copy link
Member

p0deje commented Jan 22, 2024

Except if I push a Ruby update to trunk then push a JS update to trunk, the Ruby update will get stopped

That should be handled by cancel-in-progress so in this case the Ruby update will have to finish first.

@titusfortner
Copy link
Member Author

If it matches groups and cancel-in-progress is false, it does one at a time. If cancel-in-progress is true, it will stop the previous running one and start the new one. At least that's how I understand it, which is why I no longer understand why it wasn't working the first time.

@diemol
Copy link
Member

diemol commented Jan 23, 2024

Except if I push a Ruby update to trunk then push a JS update to trunk, the Ruby update will get stopped

That should be handled by cancel-in-progress so in this case the Ruby update will have to finish first.

This.

So, we should disable it, and run a build at a time. Which should be quick enough given bazel caching.

@p0deje
Copy link
Member

p0deje commented Jan 23, 2024

At least that's how I understand it, which is why I no longer understand why it wasn't working the first time.

That's right and I am curious why it didn't work too.

@titusfortner
Copy link
Member Author

I just removed everything until we know what we want — cc85374

I deleted everything because why should we only run one build at a time? Let's run whatever GitHub lets us run.

@titusfortner
Copy link
Member Author

Or does that get in the way of how we are doing caching?

If 2 jobs need to use the same cache... I'm not sure how bazel + github manages that.

@titusfortner
Copy link
Member Author

I'm closing this, we'll know if we want to make changes later, don't need this to track it

Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C-build Build related issues (bazel and CI) I-defect
Projects
None yet
Development

No branches or pull requests

3 participants