Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the batteries-included apollo-server, ApolloServer.stop() can hang indefinitely waiting for connections to close (just like net.Server.close) #4097

Closed
justinmchase opened this issue May 9, 2020 · 23 comments · Fixed by #4908

Comments

@justinmchase
Copy link

justinmchase commented May 9, 2020

Description

I've been trying to reproduce an issue where the server process appears to be orphaned and continues running despite being killed.

I'm using tsc-watch to watch for code changes and when a file change is detected it sends SIGTERM to the process and then recreates it by invoking node ./build again.

However in this scenario some simple code in my application is unexpectedly causing the process to be orphaned and never exit as multiple processes are created all vying for the same port.

Version Info

apollo-server: 2.13.0

Expected Behavior

The server stops when await server.stop() is called

Actual Behavior

The server does not stop and the call to server never finishes, the process never exits.

Reproduction

https://github.com/justinmchase/apollo-lerna-repro

Steps

npm i
npm start
# open http://localhost:3000 in browser

This will cause a simple react page to continually call the api. If you open this file:
https://github.com/justinmchase/apollo-lerna-repro/blob/master/packages/api/src/index.ts

And simply save it you will see the server restart seamlessly.

Now go and uncomment this line:
https://github.com/justinmchase/apollo-lerna-repro/blob/master/packages/api/src/index.ts#L61

And observe that the call to await server.stop() does not stop the server and never returns and the process is never exited.

@justinmchase
Copy link
Author

justinmchase commented May 9, 2020

Correction, I'm seeing it take 30s-2m to complete the call to stop. I wasn't patient enough but it does seem to eventually stop.

That is still surprisingly long and I feel like there is something wrong but it does eventually stop. I don't mind it actually waiting for current connections to complete but it does appear to keep accepting new connections which seems bad.

EDIT: Correction to the correction, sometimes it stops sometimes even after 5m its still running.

@ohager
Copy link

ohager commented Jul 7, 2020

Same behavior here - just for the records

@justinmchase
Copy link
Author

After testing this extensively I think that all the complexity of this sample can be ignored and simply starting the server and calling await server.stop() is the only problem. The server does not alway stop and the time it can take to stop is highly variable and it can actually continue to accept and process new incoming connections while hung in the stop method.

If I simply close the process then the server does stop (of course) but it will abruptly end all ongoing requests mid stream.

What I expect to happen is for the server to immediately stop accepting incoming connections and once all ongoing connections are completed then the call to stop resolves. If there are no currently processing requests then it would end immediately.

@wutzi15
Copy link

wutzi15 commented Jul 13, 2020

I can reproduce the error too - just for the record

@justinmchase
Copy link
Author

May be related to nodejs/node#34830

@earnubs
Copy link

earnubs commented Oct 28, 2020

May be related to nodejs/node#34830

I have tests for an Apollo server that exit cleanly when run on the host (macOS) but fail to exit in Docker container, I've resolved the issue down to just being about the server starting and not exiting on Docker, so that maybe supports the relation.

@Merott
Copy link

Merott commented Nov 3, 2020

Running into the same issue. Any good workarounds?

@ajotaos
Copy link

ajotaos commented Nov 8, 2020

I've got the same behavior on my repo. Is there someone from the team aware of this issue?

@justinmchase
Copy link
Author

justinmchase commented Nov 10, 2020

No known workarounds or traction from the team.

Well, the only known workaround is to just not try to stop the server and kill the entire process. That will work, of course any sockets connected will be abruptly terminated as well.

@aminlatifi
Copy link

I have the same problem in testing, any updates?

@justinmchase
Copy link
Author

justinmchase commented Nov 30, 2020

I think you have to go and 👍 the main issue and once enough people interact with the issue it will get noticed by the team. As it is they have 469 open issues so I'm assuming they're not seeing these items or using this database actively.

@chancecarey
Copy link

Still running into this issue, noticed it when using ts-node-dev.

[watch:server ] [DEBUG] 16:34:52 Removing all watchers from files
[watch:server ] [DEBUG] 16:34:52 Child is still running, restart upon exit
[watch:server ] [DEBUG] 16:34:52 Disconnecting from child
[watch:server ] [DEBUG] 16:34:52 Sending SIGTERM kill to child pid 31733
[watch:server ] [DEBUG] 16:34:59 Child exited with code 0
[watch:server ] [DEBUG] 16:34:59 Starting child process -r /tmp/ts-node-dev-hook-7320061111056044.js /home/chance/test/api/node_modules/ts-node-dev/lib/wrap.js src/index.ts
[watch:server ] [DEBUG] 16:34:59 /home/chance/test/api/src/index.ts added to watcher
[watch:server ] [DEBUG] 16:34:59 /home/chance/test/api/src/schema.ts added to watcher
[watch:server ] 🚀  Server ready at http://localhost:4000/

It sometimes immediately restarts, and sometimes there is (like above) a random delay between sending SIGTERM and it actually restarting.

@bigman73
Copy link

bigman73 commented Feb 8, 2021

This jest (typescript) integration test reproduces the issue
The apollo server (with express, or maybe it is not relevant) doesn't stop properly and jest is hanging.

import { ApolloServer, gql } from 'apollo-server-express'
import express from 'express'

const typeDefs = gql`
  type Health {
    ok: Boolean!
    version: String!
  }
  type Query {
    health: Health!
  }
`

const resolvers = {
  Query: {
    health: (_root: unknown, _args: unknown) => {
      return {
        ok: true,
        version: process.env['npm_package_version']
      }
    }
  }
}

let server: ApolloServer

beforeAll(async () => {
  server = new ApolloServer({
    typeDefs,
    resolvers
  })
  const app = express()
  server.applyMiddleware({ app })

  await app.listen({ port: 4000 })
}, 20000)

afterAll(async () => {
  // FIXME: The jest process is not existing naturally. Something in apollo-server is holding it up
  await server.stop()
}, 20000)

describe('test server', () => {
  it('should be healthy', async () => {
    // TODO: Execute GraphQL query on the server
  }, 5000)
})

output:

[1]  + done       clear
 PASS  tests/integration/service2.test.ts
  test server
    ✓ should be healthy

Test Suites: 1 passed, 1 total
Tests:       1 passed, 1 total
Snapshots:   0 total
Time:        1.147 s, estimated 5 s
Ran all test suites matching /\/Users\/SOME_FOLDER/tests\/integration\/service2.test.ts/i with tests matching "test server".

All tests passed

tests/integration/service2.test.ts
  test server
    ✓ should be healthy

Jest did not exit one second after the test run has completed.

This usually means that there are asynchronous operations that weren't stopped in your tests. Consider running Jest with `--detectOpenHandles` to troubleshoot this issue.

@bigman73
Copy link

bigman73 commented Feb 8, 2021

The problem is not with jest, removing apollo solves the jest hanging problem

@glasser
Copy link
Member

glasser commented Feb 8, 2021

@bigman73 ApolloServer.stop() only stops the lifecycle of code that's specifically to the apollo/graphql server itself: eg, any plugins, signal handlers, etc. But the way apollo-server-express works is that it installs middleware on your separately-created Express app and you tell express to listen with app.listen. The connection between the ApolloServer object and the Express app is pretty minimal: AS just adds some middleware (request handlers). It's still your job to stop the HTTP server started by app.listen.

I think this is kind of confusing! One problem is that ApolloServer doesn't have a start() method that corresponds to the stop() method. This leads to other issues too; for example, some errors that can occur asynchronously during startup (eg, errors from plugins or from loading a federated schema) aren't easily handled by your program. I am planning to very soon introduce a (optional for now) start API, so basic Express usage would look like:

  const apolloServer = new ApolloServer({
    typeDefs,
    resolvers
  });
  const app = express();
  apolloServer.applyMiddleware({ app });
  await apolloServer.start();
  const httpServer = app.listen({ port: 4000 })

  // later
  await new Promise(resolve => httpServer.close(resolve));  // ... though look a few comments down for some caveats!
  await apolloServer.stop();

I think adding a start() method (which does not start the HTTP server!) will make it more clear that the stop() method does not stop the HTTP server.

(Right now, the equivalent of start() gets called automatically from the ApolloServer constructor and there's no easy way to await its successful result and handle any errors.)

@glasser
Copy link
Member

glasser commented Feb 8, 2021

My previous comment is specific to apollo-server-express as mentioned by @bigman73. On the other hand, apollo-server (the batteries-included not-super-configurable built-in-Express ApolloServer)'s stop method is supposed to stop the HTTP server! That I will look into today, since it relates to some other stop-related work I'm doing.

@bigman73
Copy link

bigman73 commented Feb 8, 2021

Thanks @glasser for the detailed explanation
I followed your idea of stopping http server and it works. Please note that is seems (http) Server class has no stop() method, it does have a close() method

This revised code doesn't hang jest:

let server: ApolloServer
let httpServer: HttpServer

beforeAll(async () => {
  server = new ApolloServer({
    typeDefs,
    resolvers
  })
  const app = express()
  server.applyMiddleware({ app })

  httpServer = await app.listen({ port: 4000 })
}, 20000)

afterAll(async () => {
  httpServer.close()
  await server.stop()
}, 20000)

@glasser
Copy link
Member

glasser commented Feb 8, 2021

I tried to reproduce with the original lerna reproduction, which uses apollo-server.

With apollo-server, ApolloServer.stop() first calls close() on the http.Server and waits for its callback to be invoked. We're talking the Node core http.Server close method here  — actually, it's the net.Server close method.

What does this method do? It stops the server from accepting new connections and waits to invoke its callback until all existing connections are done. But it does nothing to proactively ensure existing connections will ever finish.

So yes, require('apollo-server').ApolloServer.stop() works exactly like Node's net.Server.close: it waits until all connections naturally die.

On the one hand, this is the most "generic" approach — it doesn't enforce the policy of breaking existing connections. On the other hand, it's not super easy to use! It looks like there are a bunch of npm packages out there that try to fix this. http-terminator looks pretty compelling to me. (Its README links to four others; notably, stoppable has more npm downloads, but this comment by the http-terminator author on stoppable seems believable.)

As an immediate workaround: I think people running into this problem should switch from apollo-server to apollo-server-express and install something like http-terminator themselves. This isn't really an Apollo Server problem — it's an http.Server problem.

More broadly: I do think that apollo-server's "out of the box" experience should work better here. Even though it's just an http.Server problem, apollo-server hides the http.Server from you, and so if you choose to use that particular package, you can't work around this issue. I think it probably does makes sense that the "batteries included" apollo-server's server.stop() should use something like http-terminator with a sensibly chosen gracefulTerminationTimeout (10 seconds?); if you want more control (like a different timeout or not installing http-terminator at all), then you're welcome to switch to apollo-server-express and manage the http.Server yourself.

@glasser
Copy link
Member

glasser commented Feb 8, 2021

@bigman73 Thanks for the correction on the function name. I updated my previous comment.

Note that with your code, when httpServer.close returns, your HTTP server, won't answer new incoming connections, but your express server can still try to process requests on existing connections. If any of these rely on the assumption that your Apollo Server hasn't been stopped (eg, if you expect those final requests to be reflected in usage reports) then that assumption won't be true! You'll want to use something like http-terminator as suggested in my previous comment for this.

@glasser glasser changed the title server.stop does not stop In the batteries-included apollo-server, ApolloServer.stop() can hang indefinitely waiting for connections to close (just like net.Server.close) Feb 8, 2021
@glasser
Copy link
Member

glasser commented Feb 8, 2021

(After a bit more research, I think I like stoppable a bit better than http-terminator due to gajus/http-terminator#22 and gajus/http-terminator#16.)

@bigman73
Copy link

bigman73 commented Feb 8, 2021

@glasser I understand the risk, but in automatic integration testing (e.g. using Jest) the chances of someone unexpectedly using apollo-server are zero. The server is created on the fly and needs to die on the fly.

Yes, i do think the experience could be better.
Even with apollo-server-express I have my own helper function createGraphQLServer that takes care of the entire stack (express, optional voyager, etc.)
The stopGraphQLServer function takes care of the tear down and now includes httpServer.close()
I think a future version should provide this sugar coated layer which will make it safe for users and easier to integrate with apollo-server, using one line for creation and one line for destruction.

glasser added a commit that referenced this issue Feb 8, 2021
Previously, `ApolloServer.stop()` functioned like `net.Server.close()` in that
it did not close idle connections or close active connections after a grace
period. This meant that trying to `await ApolloServer.stop()` could hang
indefinitely if there are open connections. Now, this method closes idle
connections, and closes active connections after 10 seconds. The grace period
can be adjusted by passing the new `stopGracePeriodMillis` option to `new
ApolloServer`, or disabled by passing `Infinity` (though it will still close
idle connections).

Note that this only applies to the "batteries-included" `ApolloServer` in the
`apollo-server` package with its own built-in Express and HTTP servers.

Fixes #4097.
glasser added a commit that referenced this issue Feb 8, 2021
Previously, `ApolloServer.stop()` functioned like `net.Server.close()` in that
it did not close idle connections or close active connections after a grace
period. This meant that trying to `await ApolloServer.stop()` could hang
indefinitely if there are open connections. Now, this method closes idle
connections, and closes active connections after 10 seconds. The grace period
can be adjusted by passing the new `stopGracePeriodMillis` option to `new
ApolloServer`, or disabled by passing `Infinity` (though it will still close
idle connections).

The feature is implemented with the `stoppable` package. I audited a few similar
packages including `http-terminator` but stoppable seemed to be the simplest and
most widely used.

Note that this only applies to the "batteries-included" `ApolloServer` in the
`apollo-server` package with its own built-in Express and HTTP servers.

Fixes #4097.
@glasser
Copy link
Member

glasser commented Feb 8, 2021

Check out #4908.

glasser added a commit that referenced this issue Feb 9, 2021
Previously, `ApolloServer.stop()` functioned like `net.Server.close()` in that
it did not close idle connections or close active connections after a grace
period. This meant that trying to `await ApolloServer.stop()` could hang
indefinitely if there are open connections. Now, this method closes idle
connections, and closes active connections after 10 seconds. The grace period
can be adjusted by passing the new `stopGracePeriodMillis` option to `new
ApolloServer`, or disabled by passing `Infinity` (though it will still close
idle connections).

The feature is implemented with the `stoppable` package. I audited a few similar
packages including `http-terminator` but stoppable seemed to be the simplest and
most widely used.

Note that this only applies to the "batteries-included" `ApolloServer` in the
`apollo-server` package with its own built-in Express and HTTP servers.

Fixes #4097.
@glasser
Copy link
Member

glasser commented Feb 9, 2021

Released in v2.20.0.

This was referenced Mar 9, 2021
kodiakhq bot added a commit to ProjectXero/dbds that referenced this issue Mar 17, 2021
Bumps apollo-datasource from 0.7.2 to 0.7.3.

Changelog
Sourced from apollo-datasource's changelog.

CHANGELOG
The version headers in this history reflect the versions of Apollo Server itself.  Versions of other packages (e.g., those which are not actual HTTP integrations; packages not prefixed with "apollo-server", or just supporting packages) may use different versions.
🆕 Please Note!: 🆕 The @apollo/federation and @apollo/gateway packages now live in the apollographql/federation repository.

@apollo/gateway
@apollo/federation

vNEXT

The changes noted within this vNEXT section have not been released yet.  New PRs and commits which introduce changes should include an entry in this vNEXT section as part of their development.  With few exceptions, the format of the entry should follow convention (i.e., prefix with package name, use markdown backtick formatting for package names and code, suffix with a link to the change-set à la [PR #YYY](https://link/pull/YYY), etc.).  When a release is being prepared, a new header will be (manually) created below and the appropriate changes within that release will be moved into the new section.


apollo-server-core: The SIGINT and SIGTERM signal handlers installed by default (when not disabled by stopOnTerminationSignals: false) now stay active (preventing process termination) while the server shuts down, instead of letting a second signal terminate the process. The handlers still re-signal the process after this.stop() concludes. Also, if this.stop() throws, the signal handlers will now log and exit 1 instead of throwing an uncaught exception. [Issue #4931](apollographql/apollo-server#4931)
apollo-server-lambda: (UPDATE THIS MESSAGE BEFORE RELEASE; we are not sure if this actually helps nodejs14 compatibility or if it's just a nice refactor.) Support the nodejs14 runtime by changing the handler to be an async handler. (For backwards compatibility, if the handler receives a callback, it still acts like a non-async handler.) [Issue #1989](apollographql/apollo-server#1989) [PR #5004](apollographql/apollo-server#5004)

v2.21.1

apollo-server-lambda: The onHealthCheck option did not previously work. Additionally, health checks (with onHealthCheck or without) didn't work in all Lambda contexts, such as behind Custom Domains; the path check is now more flexible. [Issue #3999](apollographql/apollo-server#3999) [PR #4969](apollographql/apollo-server#4969) [Issue #4891](apollographql/apollo-server#4891) [PR #4892](apollographql/apollo-server#4892)
The debug option to new ApolloServer (which adds stack traces to errors) now affects errors that come from requests executed with server.executeOperation (and its wrapper apollo-server-testing), instead of just errors that come from requests executed over HTTP. [Issue #4107](apollographql/apollo-server#4107) [PR #4948](apollographql/apollo-server#4948)
Bump version of @apollographql/graphql-playground-html to v1.6.27 and @apollographql/graphql-playground-react to v1.7.39 to resolve incorrectly rendered CDN URL when Playground version was false-y.  [PR #4932](apollographql/apollo-server#4932) [PR #4955](apollographql/apollo-server#4955) [Issue #4937](apollographql/apollo-server#4937)

v2.21.0

Apollo Server can now be installed with graphql@15 without causing peer dependency errors or warnings. (Apollo Server has a file upload feature which was implemented as a wrapper around the graphql-upload package. We have been unable to upgrade our dependency on that package due to backwards-incompatible changes in later versions, and the version we were stuck on did not allow graphql@15 as a peer dependency. We have now switched to a fork of that old version called @apollographql/graphql-upload-8-fork that allows graphql@15.) Also bump the graphql-tools dependency from 4.0.0 to 4.0.8 for graphql@15 support. [Issue #4865](apollographql/apollo-server#4865)

v2.20.0

apollo-server: Previously, ApolloServer.stop() functioned like net.Server.close() in that it did not close idle connections or close active connections after a grace period. This meant that trying to await ApolloServer.stop() could hang indefinitely if there are open connections. Now, this method closes idle connections, and closes active connections after 10 seconds. The grace period can be adjusted by passing the new stopGracePeriodMillis option to new ApolloServer, or disabled by passing Infinity (though it will still close idle connections). Note that this only applies to the "batteries-included" ApolloServer in the apollo-server package with its own built-in Express and HTTP servers. [PR #4908](apollographql/apollo-server#4908) [Issue #4097](apollographql/apollo-server#4097)
apollo-server-core: When used with ApolloGateway, ApolloServer.stop now invokes ApolloGateway.stop. (This makes sense because ApolloServer already invokes ApolloGateway.load which is what starts the behavior stopped by ApolloGateway.stop.) Note that @apollo/gateway 0.23 will expect to be stopped in order for natural program shutdown to occur. [PR #4907](apollographql/apollo-server#4907) [Issue #4428](apollographql/apollo-server#4428)
apollo-server-core: Avoid instrumenting schemas for the old graphql-extensions library unless extensions are provided. [PR #4893](apollographql/apollo-server#4893) [Issue #4889](apollographql/apollo-server#4889)
apollo-server-plugin-response-cache@0.6.0: The shouldReadFromCache and shouldWriteToCache hooks were always documented as returning ValueOrPromise<boolean> (ie, that they could be either sync or async), but they actually only worked if they returned a bool. Now they can be either sync or async as intended. [PR #4890](apollographql/apollo-server#4890) [Issue #4886](apollographql/apollo-server#4886)
apollo-datasource-rest@0.10.0: The RESTDataSource.trace method is now protected instead of private to allow more control over logging and metrics. [PR #3940](apollographql/apollo-server#3940)

v2.19.2

apollo-server-express: types: Export ExpressContext from main module. [PR #4821](apollographql/apollo-server#4821) [Issue #3699](apollographql/apollo-server#3699)
apollo-server-env: types: The first parameter to fetch is now marked as required, as intended and in accordance with the Fetch API specification. [PR #4822](apollographql/apollo-server#4822) [Issue #4741](apollographql/apollo-server#4741)
apollo-server-core: Update graphql-tag package to latest, now with its graphql-js peerDependencies expanded to include ^15.0.0 [PR #4833](apollographql/apollo-server#4833)

v2.19.1

apollo-server-core: The debugPrintReports option to ApolloServerPluginUsageReporting now prints traces as well. [PR #4805](apollographql/apollo-server#4805)

v2.19.0

apollo-server-testing: types: Allow generic variables usage of query and mutate functions. [PR #4383](apollograpqh/apollo-server#4383)
apollo-server-express: Export the GetMiddlewareOptions type. [PR #4599](apollograpqh/apollo-server#4599)
apollo-server-lambda: Fix file uploads - ignore base64 decoding for multipart queries. [PR #4506](apollographql/apollo-server#4506)
apollo-server-core: Do not send  operation documents that cannot be executed to Apollo Studio. Instead, information about these operations will be combined into one "operation" for parse failures, one for validation failures, and one for unknown operation names.



... (truncated)


Commits

c212627 Release
See full diff in compare view




Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants