Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run yum update on latest 1.22 gpu-ami #1180

Closed
oe-hbk opened this issue Feb 10, 2023 · 1 comment · Fixed by #1188
Closed

Can't run yum update on latest 1.22 gpu-ami #1180

oe-hbk opened this issue Feb 10, 2023 · 1 comment · Fixed by #1188

Comments

@oe-hbk
Copy link

oe-hbk commented Feb 10, 2023

What happened:

yum update fails with:

Error: Package tuple ('kernel-devel', 'x86_64', '0', '5.4.228', '132.418.amzn2') could not be found in packagesack

**Steps to reproduce **:

# uname -r
5.4.228-131.415.amzn2.x86_64
# yum repolist
Loaded plugins: dkms-build-requires, nvidia, priorities, update-motd, versionlock
7 packages excluded due to repository priority protections
repo id                                                                                                          repo name                                                                                                           status
amzn2-core/2/x86_64                                                                                              Amazon Linux 2 core repository                                                                                      29,654+196
amzn2-nvidia/2/x86_64                                                                                            Amazon Linux 2 Nvidia repository                                                                                         565+7
amzn2extra-docker/2/x86_64                                                                                       Amazon Extras repo for docker                                                                                            27+52
amzn2extra-kernel-5.4/2/x86_64                                                                                   Amazon Extras repo for kernel-5.4                                                                                       242+82
!neuron                                                                                                          Neuron YUM Repository                                                                                                   248+22
repolist: 30,736
# yum versionlock list
Loaded plugins: dkms-build-requires, nvidia, priorities, update-motd, versionlock
0:runc-1.1.4-1.amzn2.*
0:containerd-1.6.6-1.amzn2.0.2.*
0:docker-20.10.17-1.amzn2.0.1.*
0:kernel-headers-5.4.228-131.415.amzn2.*
0:kernel-devel-5.4.228-131.415.amzn2.*
0:nvidia-container-runtime-hook-1.4.0-1.amzn2.*
versionlock list done
# yum clean all
Loaded plugins: dkms-build-requires, nvidia, priorities, update-motd, versionlock
Cleaning repos: amzn2-core amzn2-nvidia amzn2extra-docker amzn2extra-kernel-5.4 neuron
Cleaning up everything
Maybe you want: rm -rf /var/cache/yum, to also free up space taken by orphaned data from disabled or removed repos
# rm -rf /var/cache/yum
# yum update
Loaded plugins: dkms-build-requires, nvidia, priorities, update-motd, versionlock
amzn2-core                                                                                                                                                                                                              | 3.7 kB  00:00:00
amzn2-nvidia                                                                                                                                                                                                            | 2.5 kB  00:00:00
amzn2extra-docker                                                                                                                                                                                                       | 3.0 kB  00:00:00
amzn2extra-kernel-5.4                                                                                                                                                                                                   | 3.0 kB  00:00:00
neuron                                                                                                                                                                                                                  | 2.9 kB  00:00:00
(1/9): amzn2-core/2/x86_64/updateinfo                                                                                                                                                                                   | 554 kB  00:00:00
(2/9): amzn2-core/2/x86_64/group_gz                                                                                                                                                                                     | 2.5 kB  00:00:00
(3/9): amzn2extra-kernel-5.4/2/x86_64/updateinfo                                                                                                                                                                        |  31 kB  00:00:00
(4/9): amzn2extra-docker/2/x86_64/updateinfo                                                                                                                                                                            | 8.0 kB  00:00:00
(5/9): amzn2-nvidia/2/x86_64/primary_db                                                                                                                                                                                 | 335 kB  00:00:00
(6/9): amzn2extra-docker/2/x86_64/primary_db                                                                                                                                                                            | 101 kB  00:00:00
(7/9): amzn2extra-kernel-5.4/2/x86_64/primary_db                                                                                                                                                                        |  21 MB  00:00:00
(8/9): neuron/primary_db                                                                                                                                                                                                | 108 kB  00:00:00
(9/9): amzn2-core/2/x86_64/primary_db                                                                                                                                                                                   |  69 MB  00:00:00
7 packages excluded due to repository priority protections
Resolving Dependencies
Error: Package tuple ('kernel-devel', 'x86_64', '0', '5.4.228', '132.418.amzn2') could not be found in packagesack
#

What you expected to happen:
Any packages with updates besides those versionlocked to be installed

Anything else we need to know?:
Trying to build a new AMI for 1.22 based on amazon-eks-gpu-node-1.22-v20230203

Environment:

  • AWS Region: us-east-1
  • Instance Type(s):
  • EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion):
  • Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version):
  • AMI Version: ami-079363bca92a41c98 - amazon-eks-gpu-node-1.22-v20230203
  • Kernel (e.g. uname -a): Linux ip-10-130-72-1.hbk.com 5.4.228-131.415.amzn2.x86_64 Template is missing source_ami_id in the variables section #1 SMP Tue Dec 20 12:51:02 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Release information (run cat /etc/eks/release on a node):
BASE_AMI_ID="ami-04e0068781ee1fe9e"
BUILD_TIME="Fri Feb  3 16:47:43 UTC 2023"
BUILD_KERNEL="5.4.228-131.415.amzn2.x86_64"
ARCH="x86_64"
@cartermckinnon
Copy link
Member

This issue is caused because this AMI release has a lock on kernel-devel and kernel-headers, but not the kernel. That will be fixed in the next AMI release. We've added locking on kernel packages to prevent unintentional updates that could introduce instability; and because updating the kernel on a running node requires a reboot, which should generally only be done within a managed upgrade process that will avoid disruption to your workloads.

You can always get around the version locks by disregarding the plugin:

yum update --noplugins

But be aware that the kernel, containerd, nvidia-driver-latest-dkms, etc. that is installed may introduce instability and version skew across your nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants