Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(containerd): process exit event lost #1933

Open
cartermckinnon opened this issue Aug 23, 2024 · 5 comments
Open

bug(containerd): process exit event lost #1933

cartermckinnon opened this issue Aug 23, 2024 · 5 comments

Comments

@cartermckinnon
Copy link
Member

cartermckinnon commented Aug 23, 2024

What happened:

EKS has observed failures caused by a containerd bug in which a dropped event leads to containerd losing track of a container's status. You'll see errors like this in your logs:

OCI runtime exec failed: exec failed: unable to start container process: error executing setns process: exit status 1: unknown
OCI runtime exec failed: exec failed: cannot exec in a stopped container: unknown

This seems to have been introduced by containerd/containerd#9828

Which is present in containerd 1.7.14 and above: https://github.com/containerd/containerd/releases/tag/v1.7.14

Versions of containerd with this change are included in EKS AMIs:

  • AL2023: v20240807 and later
  • AL2: v20240817 and later

More information is available in containerd/containerd#10589.

A fix is being attempted in containerd/containerd#10603.

@cartermckinnon cartermckinnon changed the title tracking(containerd): process exit event lost bug(containerd): process exit event lost Aug 23, 2024
@cartermckinnon
Copy link
Member Author

cartermckinnon commented Aug 27, 2024

We've continued to see customers impacted by this bug, and will be downgrading to the previous containerd in the next AMI release (this week), since there is no fix from upstream containerd at this time.

In the meantime, downgrading containerd in your user data will mitigate this issue.

On AL2:

yum versionlock delete containerd
yum downgrade -y containerd-1.7.11
yum versionlock containerd

On AL2023:

dnf downgrade -y containerd-1.7.11

@tehlers320
Copy link

tehlers320 commented Aug 28, 2024

are you sure the fix being attempted is the right one, it references containerd/containerd@892dc54 as the cause however that only lists: [v2.0.0-rc.3](https://github.com/containerd/containerd/releases/tag/v2.0.0-rc.3) [v2.0.0-rc.2](https://github.com/containerd/containerd/releases/tag/v2.0.0-rc.2) [v2.0.0-rc.1](https://github.com/containerd/containerd/releases/tag/v2.0.0-rc.1) [v2.0.0-rc.0](https://github.com/containerd/containerd/releases/tag/v2.0.0-rc.0) [api/v1.8.0-rc.3](https://github.com/containerd/containerd/releases/tag/api%2Fv1.8.0-rc.3) [api/v1.8.0-rc.2](https://github.com/containerd/containerd/releases/tag/api%2Fv1.8.0-rc.2) [api/v1.8.0-rc.1](https://github.com/containerd/containerd/releases/tag/api%2Fv1.8.0-rc.1) [api/v1.8.0-rc.0](https://github.com/containerd/containerd/releases/tag/api%2Fv1.8.0-rc.0)

the only thing i can see that touches the shim between .11 and .20 seems to be this containerd/containerd@2ad2a2e which is tied to os/exec

@cartermckinnon
Copy link
Member Author

@tehlers320 that PR (9828) was cherry-picked to both 1.7 (containerd/containerd#9928) and 1.6 (containerd/containerd#9927) release branches. It’s mentioned in the 1.7.14 release notes I’ve linked above.

@SamuraiPrinciple
Copy link

SamuraiPrinciple commented Sep 10, 2024

Not sure if this is the place to mention it, but the fix has been released with containerd 1.7.22

@cartermckinnon
Copy link
Member Author

Yes, we're in the process of getting an updated containerd in the Amazon Linux repositories. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants