DigitalOcean www server #3424

richardlau · 2023-07-14T16:49:57Z

It doesn't look like we have a tracking issue for this, although much has been discussed spread out over several Slack discussion threads. See also nodejs/TSC#1416.

Summary

Our DigitalOcean hosted droplet for our www server (one of two servers behind a Cloudflare load balancer) has become very unreliable this year, seemingly getting worse to the extent that over the last few weeks it "works" for about a day (or less) and then runs out of file descriptors (error messages visible in the nginx error logs) and Cloudflare believes the server to be unhealthy (=cannot reach the /traffic-manager endpoint?) and switches over to the other server (called Joyent but now resides on Equinix Metal).

Prior to the last few weeks the "unhealthy" state was temporary -- eventually CF would decide the server was healthy again and switch back to the DO server but now it appears that the DO server remains unhealthy in CF until we restart nginx.

richardlau · 2023-07-14T17:03:08Z

Over the last few weeks, due to the DO server not automatically recovering without intervention, we've been predominantly serving from Equinix Metal (Joyent).

AFAICT the Equinix Metal is not suffering the same issues as the DO server. While we are occasionally getting load balancer alert emails through to the build alias from CF it's nowhere near the frequency we were getting them for the DO server.

I don't think we've reflected all the nginx tweaks that have been made on the DO server to the Equinix Metal one so it might be worth looking at the differences there. In particular I think the connection limits are lower/not set on the Equinix Metal server. Other differences between the two servers are that nightly/v8-canary/release builds are pushed (from our release machines via scp) to the DO server -- an rsyncmirror.service runs on the Equinix Metal server periodically pulling things from the DO one.

richardlau · 2023-07-14T17:11:08Z

Oh and while I have no evidence that suggests it would solve/address any of the current issues, we really should plan how we're going to update the server to a later OS (and probably nginx as I assume the one in the apt repository is old). It may be worth considering creating a replacement server from scratch vs a risky upgrade of the existing server(s).

targos · 2023-07-14T18:13:21Z

It may be worth considering creating a replacement server from scratch vs a risky upgrade of the existing server(s).

Absolutely agree.

ovflowd · 2023-07-15T11:39:25Z

(=cannot reach the /traffic-manager endpoint?) and switches over to the other server)

Which is even worse because that endpoint is a pure HTTP response with no file access, and for not being able to handle that...

ovflowd · 2023-07-15T11:40:35Z

It may be worth considering creating a replacement server from scratch vs a risky upgrade of the existing server(s).

Big +1

MoLow · 2023-07-16T07:05:48Z

add to the build agenda so we can discuss how to proceed on this

targos · 2023-07-17T14:29:11Z

(=cannot reach the /traffic-manager endpoint?) and switches over to the other server)

Which is even worse because that endpoint is a pure HTTP response with no file access, and for not being able to handle that...

As I understand it, the problem is that nginx reaches the maximum open files limit and cannot accept new connections (including those that come from the CF load balancer health checks).

richardlau · 2023-07-17T17:23:51Z

Just on the point re. creating a new server -- our existing server was created five years ago and is on the basic plan (perhaps that was all that was available then?). Theoretically it has a 2 Gbps maximum network throughput but I don't think I've seen the droplet hit that, even when we raised the open file limit on the droplet.

"CPU-Optimized Droplets with Premium CPUs" have a higher throughput limit of 10 Gbps but will cost more. I don't have access to (nor do I particular want access to) billing for our DO account so I don't know what our current droplet is costing vs our credits. I don't know what our credits are on the DO account either but I do know that we've run out in the past. If we decide to go with a larger droplet then we should loop in the OpenJS Foundation.

targos · 2023-11-29T15:20:23Z

I forgot to mention somewhere that when we upgraded the DO server (#3564), we also bumped it to Premium Intel CPUs.

richardlau added infra build labels Jul 14, 2023

MoLow added the build-agenda label Jul 16, 2023

mhdawson mentioned this issue Jul 24, 2023

Node.js Build WorkGroup Meeting 2023-07-26 #3432

Closed

mhdawson mentioned this issue Aug 14, 2023

Node.js Build WorkGroup Meeting 2023-08-15 #3451

Closed

ovflowd mentioned this issue Aug 18, 2023

Cloudflare R2 Workers #3461

Open

UlisesGascon mentioned this issue Aug 22, 2023

Integrity checks for R2 migration #3469

Open

8 tasks

MattIPv4 mentioned this issue Aug 30, 2023

SLOW OR FAILED (500 ERROR) NODE.JS DOWNLOADS #1993

Closed

mhdawson mentioned this issue Sep 4, 2023

Node.js Build WorkGroup Meeting 2023-09-06 #3479

Closed

mhdawson mentioned this issue Sep 25, 2023

Node.js Build WorkGroup Meeting 2023-09-26 #3500

Closed

mhdawson mentioned this issue Oct 16, 2023

Node.js Build WorkGroup Meeting 2023-10-18 #3523

Closed

mhdawson mentioned this issue Nov 27, 2023

Node.js Build WorkGroup Meeting 2023-11-29 #3574

Closed

mhdawson removed the build-agenda label Nov 29, 2023

flakey5 mentioned this issue Jul 17, 2024

Nodejs intermittent 500/503 responses nodejs/release-cloudflare-worker#128

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DigitalOcean www server #3424

DigitalOcean www server #3424

richardlau commented Jul 14, 2023

richardlau commented Jul 14, 2023

richardlau commented Jul 14, 2023

targos commented Jul 14, 2023

ovflowd commented Jul 15, 2023

ovflowd commented Jul 15, 2023

MoLow commented Jul 16, 2023

targos commented Jul 17, 2023

richardlau commented Jul 17, 2023

targos commented Nov 29, 2023

DigitalOcean www server #3424

DigitalOcean www server #3424

Comments

richardlau commented Jul 14, 2023

Summary

richardlau commented Jul 14, 2023

richardlau commented Jul 14, 2023

targos commented Jul 14, 2023

ovflowd commented Jul 15, 2023

ovflowd commented Jul 15, 2023

MoLow commented Jul 16, 2023

targos commented Jul 17, 2023

richardlau commented Jul 17, 2023

targos commented Nov 29, 2023