Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INCIDEN-839: Revert ESbuild and AWS SDK updates #268

Merged
merged 2 commits into from
Jul 3, 2024

Conversation

alex9smith
Copy link
Contributor

@alex9smith alex9smith commented Jul 3, 2024

Proposed changes

What changed

Revert ESbuild and AWS SDK updates. This effectively reverts #259 and #266 as the write-activity-log lambda is failing all requests after them.

Why did it change

The update to ESbuild isn't actually a minor update - going to 0.22.0 breaks how packages are built. See their changelog (https://github.com/evanw/esbuild/blob/main/CHANGELOG.md) and related issues (evanw/esbuild#3819) for more details.

We did already update to ESBuild 0.23.0 which should have solved the problem, but we're still seeing all invocations failing. Let's revert back to a known good config while we understand what's happened.

Related links

Checklists

Environment variables or secrets

  • No environment variables or secrets were added or changed

How to review

Please check the version numbers here against the ones in #259

This effectively reverts #259 and #266 as the write-activity-log lambda
is failing all requests after them.

The update to ESbuild isn't actually a minor update - going to 0.22.0
breaks how packages are built. See their changelog
(https://github.com/evanw/esbuild/blob/main/CHANGELOG.md) and related
issues (evanw/esbuild#3819) for more details.

We did already update to ESBuild 0.23.0 which should have solved the
problem, but we're still seeing all invocations failing. Let's revert
back to a known good config while we understand what's happen, but we're
still seeing all invocations failing. Let's revert back to a known good
config while we understand what's happened.
@alex9smith alex9smith merged commit 9d1dc38 into main Jul 3, 2024
4 checks passed
@alex9smith alex9smith deleted the INCIDEN-839-revert-esbuild-sdk branch July 3, 2024 10:37
alex9smith added a commit that referenced this pull request Jul 3, 2024
I'm suspicious that there's a difference here compared to the last PR
(#268) so I've deleted `package-lock.json` and regenerated it with `npm
install`
alex9smith added a commit that referenced this pull request Jul 3, 2024
## Proposed changes

<!-- Provide a summary of your changes in the title above -->
<!-- Include the Jira ticket number in square brackets as prefix, eg
`[OLH-XXXX]: PR Title` -->

### What changed

Revert ESbuild version in actions and also regenerate the
`package-lock.json` file since I'm suspicious that there's a difference
now compared to the last PR.

### Why did it change

#268 didn't fix the lambda
alex9smith added a commit that referenced this pull request Jul 3, 2024
The ESBuild reverts in #268 and #270 did fix the problem building the
lambdas, but the write activity log lambda was still failing with
timeouts.

It turns out this was for a different reason - the input queue for this
function was backed up with ~100k events. We have batch sizes of 10
events per invocation, but under normal operation the batches are much
smaller than this. This lambda does a lot per event because it has to
encrypt and write each to DynamoDB seperately. That meant once it
was processing 10 events per invocation these were legitimately timing
out because all the operations take longer than 5 seconds.

This is the only lambda that does so much on each invocation. Increase
the timeout to 30 seconds. I can see that batches of 10 events are
taking ~9 seconds to process in production now, so this is comfortably
above that.
alex9smith added a commit that referenced this pull request Jul 3, 2024
## Proposed changes

<!-- Provide a summary of your changes in the title above -->
<!-- Include the Jira ticket number in square brackets as prefix, eg
`[OLH-XXXX]: PR Title` -->

### What changed

Increase the timeout to 30 seconds for the `write-activity-log` lambda.

### Why did it change

The ESBuild reverts in #268 and #270 did fix the problem building the
lambdas, but the write activity log lambda was still failing with
timeouts.

It turns out this was for a different reason - the input queue for this
function was backed up with ~100k events. We have batch sizes of 10
events per invocation, but under normal operation the batches are much
smaller than this. This lambda does a lot per event because it has to
encrypt and write each to DynamoDB seperately. That meant once it was
processing 10 events per invocation these were legitimately timing out
because all the operations take longer than 5 seconds.

This is the only lambda that does so much on each invocation. I can see
that batches of 10 events are taking ~9 seconds to process in production
now, so this is comfortably above that. I've manually changed this in
integration and production already and proved that it has solved the
problem.

## Checklists

<!-- Merging this PR deploys to production. Please answer accurately.
-->

### Environment variables or secrets

- [x] No environment variables or secrets were added or changed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants