Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

large initialsync may never complete #11611

Open
richvdh opened this issue Dec 20, 2021 · 1 comment
Open

large initialsync may never complete #11611

richvdh opened this issue Dec 20, 2021 · 1 comment
Labels
A-Sync defects related to /sync S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@richvdh
Copy link
Member

richvdh commented Dec 20, 2021

For a sufficiently large account, it's possible that we will never be able to serialise the response before the client times out. This has recently been made a lot worse by #11478 which makes the serialisation process much slower, but it was a potential problem even before that.

The intention is that it shouldn't matter too much if it takes a long time to generate a sync response, because once we finally manage it, we will cache the response, which we can then serve up quickly the next time the client makes a request. However, this oversimplifies the situation.

We can basically consider there are four steps in serving a /sync response:

  1. Figure out which rooms and events should be in the response. The end result of this step is a SyncResult object, which is cached.
  2. "Serialise" each event in the response. This is where we turn the events from the internal python objects (FrozenEvent) into plain JSON dicts which match the format specified by the C-S API; in particular we add the unsigned.age field (which is the main reason this is done after caching); as of Include bundled aggregations in /sync and related fixes #11478 it is also the point at which we add bundled aggregations. The result of this step is a (potentially huge) JsonDict (ie, a dict containing other simple json-serializable values)
  3. JSON-serialise the response dict to a bytes.
  4. Finally, we start streaming the response to the client.

Since the result of step 1 is cached (at least on matrix.org), we can effectively neglect it here (though it is important to remember that it is only cached for 2 minutes... more on that later); so the steps of concern to us are steps 2 and 3, which must complete in less than the client's request timeout.

Some notes on timeouts:

  • Cloudflare times out requests if the server doesn't start sending its response within 100 seconds of the request.
  • However, haproxy doesn't notice this until we start trying to send back the response - so Synapse still sees the request as active until our haproxy times out the request, which happens after 180 seconds.
  • Before we start each of steps 2 and 3, we check that the client connection is still up, and skip that step if not. (Step 4 is aborted as soon as the connection drops.)

In short: we need to serialise the FrozenEvents to dicts (step 2) and JSON-serialise the dicts to bytes within 100 seconds.

That should be plenty though, right? Wrong.


The first problem is that #11478 makes step 2 take 224 seconds (assuming all the relevant aggregations are already cached from the database. The first time through, when we have do database activity it takes over 800 seconds):

image

https://jaeger.int.matrix.org/trace/1a723cd9877a1cc9

(Note that in the above trace, the client connection drops at 180 seconds, so we skip the JSON-serialisation step.)

Still, even with that reverted, we have problems. Consider this example:

image

https://jaeger.int.matrix.org/trace/2600e970dda6bc65

This request reached haproxy at 12:37:09, but we didn't start processing it until 12:37:56 (a delay of 47 seconds). We then manage encode the response in 58 seconds, and start streaming the response at 12:38:52, but this doesn't reach haproxy until 12:39:37 (a further 45 seconds). By that time, a total of 148 seconds has elapsed, and Cloudflare has long gone.

It's hard to trace those 47s and 45s delays precisely, but event serialisation is a synchronous process which blocks the reactor for up to a minute (eg, the preceding request has an encode_response call which blocks the reactor between 12:36:33 and 12:37:29) - after which we have a bunch of other work to catch up on. Prior experience suggests that all bets are off once the reactor starts ticking that slowly:

image


Ideas on things which might make this better:

  • Counter-intuitively, having a long timeout in haproxy might be making things worse, because it means we're doing work serving responses that will be thrown away at Cloudflare. Possibly we should decrease the timeout in haproxy to (say) 120 seconds.
  • We certainly shouldn't be blocking the reactor for 45 seconds at a time; at the very least, we should stick some sleep(0) calls in sync.encode_response.
  • It would be great to find a way to put most of the event serialisation work inside the SyncResponse cache.
    • Alternatively, maybe we can do it more lazily, so that we can start JSON-encoding without having to event-serialise the entire response.
  • Encode JSON responses on a thread in C, mk2 #10905 is also part of the problem here: See discussion at Encode JSON responses on a thread in C, mk2 #10905 (comment)).
@richvdh richvdh added S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Dec 23, 2021
@clokep
Copy link
Member

clokep commented Jan 25, 2022

After #11612 / #11659 a bunch of those is "better" -- the fully calculated data to be serialized is now stored in the ResponseCache. We do not cache the serialized data, however, which means we spend ~1 minute serializing it each time it is requested.

I think the remaining improvements here are:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Sync defects related to /sync S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

3 participants