Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incident: git binary repo is too big (78GB) - node-test-binary-windows timing out #952

Closed
refack opened this issue Oct 28, 2017 · 17 comments

Comments

@refack
Copy link
Contributor

refack commented Oct 28, 2017

I've run a git-delete-branch job, but the local repos are too "fat" so the auto GC is timing out.

@Trott
Copy link
Member

Trott commented Oct 28, 2017

Looks to me like it might only be failing on test-rackspace-win2008r2-x64-4? Going to try taking that one offline and running again...

@Trott
Copy link
Member

Trott commented Oct 28, 2017

Oh, nope, that's totally wrong, it's a bunch of hosts, but it's strange (to me) that some fail consistently and others don't.

@Trott
Copy link
Member

Trott commented Oct 28, 2017

Seeing what happens if I take all the failing nodes offline because why not? It's not as if anyone can get a CI run out of them right now.

Took these offline:

  • test-rackspace-win2008r2-x64-4
  • test-azure_msft-win10-x64-3
  • test-azure_msft-win10-x64-1
  • test-azure_msft-win2012r2-x64-3
  • test-azure_msft-win2012r2-x64-2

And here's a run to see if it fixes things or not: https://ci.nodejs.org/job/node-test-commit-windows-fanned/12947/

@refack
Copy link
Contributor Author

refack commented Oct 28, 2017

Cleaned and brought online:

  • test-rackspace-win2008r2-x64-4
  • test-azure_msft-win10-x64-3
  • test-azure_msft-win10-x64-1
  • test-azure_msft-win2012r2-x64-3
  • test-azure_msft-win2012r2-x64-2

@Trott if you see a stalled job you can go to it's job page (for example
https://ci.nodejs.org/job/node-test-binary-windows/COMPILED_BY=vs2015-x86,RUNNER=win2012r2,RUN_SUBSET=3/ )
if it's the top job you can click workspace and wipe it.

As for intermittent failures, I think it's because this is triggered by git's auto-GC logic...

@Trott
Copy link
Member

Trott commented Oct 28, 2017

Results are better, but still some failures...took this one offline:

  • test-azure_msft-win10-x64-5

There are two more with consistent build failures but they're in the middle of doing something right now so I don't want to take offline until they really fail again.

https://ci.nodejs.org/computer/test-azure_msft-win10-x64-1/builds
https://ci.nodejs.org/computer/test-rackspace-win2008r2-x64-5/builds

@Trott
Copy link
Member

Trott commented Oct 28, 2017

Build history is too convincing. Took these offline too:

  • test-azure_msft-win10-x64-1
  • test-rackspace-win2008r2-x64-5

@Trott
Copy link
Member

Trott commented Oct 28, 2017

Trying again to see if we can get a green Windows build now: https://ci.nodejs.org/job/node-test-commit-windows-fanned/12952/

@Trott
Copy link
Member

Trott commented Oct 28, 2017

Looks like it's gonna be green this time. Not sure how to fix the offline hosts but at least CI isn't perma-red.

@Trott
Copy link
Member

Trott commented Oct 29, 2017

Took test-azure_msft-win10-x64-5 offline too. Build failures, obviously.

  • test-azure_msft-win10-x64-5

@refack
Copy link
Contributor Author

refack commented Oct 29, 2017

@refack
Copy link
Contributor Author

refack commented Oct 29, 2017

@refack
Copy link
Contributor Author

refack commented Oct 29, 2017

test-azure_msft-win10-x64-5 was a little stubborn, but should now be Ok.
Also all PIs are back online.

@refack refack closed this as completed Oct 29, 2017
@refack refack mentioned this issue Oct 30, 2017
2 tasks
@joaocgreis joaocgreis reopened this Nov 1, 2017
@joaocgreis
Copy link
Member

Some workers were still failing. The node-test-binary-windows job is set up to only fetch the git branch with the binaries, so the problem is not the size of the binary repo but the local git repository. The automatic git gc would take too long.

I created a new job git-clean-windows similar to git-clean-rpi set to run evey week, should prevent this from happening again. I'll close this issue after I confirm it is working as expected.

@refack
Copy link
Contributor Author

refack commented Nov 5, 2017

@joaocgreis it seems like the local git repo does accumulate binaries even the branch was deleted from the remote. I rerun a node-test-binary-windows job and some workers were able to find the revision:
https://drive.google.com/file/d/0Bz0LZMH4OpErbVZkSm9YSFhCTVU/view?usp=sharing

@gibfahn
Copy link
Member

gibfahn commented Nov 5, 2017

@joaocgreis it seems like the local git repo does accumulate binaries even the branch was deleted from the remote. I rerun a node-test-binary-windows job and some workers were able to find the revision:

Does git fetch --prune help? That removes the tracking branches from remote repos (which are not otherwise removed).

I also run this locally to delete local branches which had an upstream, but the upstream was deleted:

# Delete orphaned local branches.
git fetch -p && git branch -vv | awk '/: gone]/{print $1}' | xargs git branch -D