Add section about image cleaning for disk space (#348)

* Add section about image cleaning for disk space Signed-off-by: Zespre Chang <zespre.chang@suse.com> * Update docs/faq.md Adding @starbops feedback Co-authored-by: Zespre Chang <zespre.chang@suse.com> --------- Signed-off-by: Zespre Chang <zespre.chang@suse.com> Co-authored-by: Lucas Saintarbor <lucas.saintarbor@suse.com>
harvester · Aug 4, 2023 · 2701f23 · 2701f23
1 parent 40acb01
commit 2701f23
Show file tree

Hide file tree

Showing 2 changed files with 108 additions and 0 deletions.
diff --git a/docs/faq.md b/docs/faq.md
@@ -48,3 +48,57 @@ New password for default administrator (user-xxxxx):
 ### I added an additional disk with partitions. Why is it not getting detected?
 
 As of Harvester v1.0.2, we no longer support adding additional partitioned disks, so be sure to delete all partitions first (e.g., using `fdisk`).
+
+### Why are there some Harvester pods that become ErrImagePull/ImagePullBackOff?
+
+This is likely because your Harvester cluster is an air-gapped setup, and some pre-loaded container images are missing. Kubernetes has a mechanism that does garbage collection against bloated image stores. When the partition which stores container images is over 85% full, `kubelet` tries to prune the images based on the last time they were used, starting with the oldest, until the occupancy is lower than 80%. These numbers (85% and 80%) are default High/Low thresholds that come with Kubernetes.
+
+To recover from this state, do one of the following depending on the cluster's configuration:
+- Pull the missing images from sources outside of the cluster (if it's an air-gapped environment, you might need to set up an HTTP proxy beforehand).
+- Manually import the images from the Harvester ISO image.
+
+:::note
+
+Take v1.1.2 as an example, download the Harvester ISO image from the official URL. Then extract the image list from the ISO image to decide which image tarball we're going to import. For instance, we want to import the missing container image `rancher/harvester-upgrade`
+
+```shell
+$ curl -sfL https://releases.rancher.com/harvester/v1.1.2/harvester-v1.1.2-amd64.iso -o harvester.iso
+
+$ xorriso -osirrox on -indev harvester.iso -extract /bundle/harvester/images-lists images-lists
+
+$ grep -R "rancher/harvester-upgrade" images-lists/
+images-lists/harvester-images-v1.1.2.txt:docker.io/rancher/harvester-upgrade:v1.1.2
+```
+
+Find out the location of the image tarball, and extract it from the ISO image. Decompress the extracted zstd image tarball.
+
+```shell
+$ xorriso -osirrox on -indev harvester.iso -extract /bundle/harvester/images/harvester-images-v1.1.2.tar.zst harvester.tar.zst
+
+$ zstd -d --rm harvester.tar.zst
+```
+
+Upload the image tarball to the Harvester nodes that need recover. Finally, execute the following command to import the container images on each of them.
+
+```shell
+$ ctr -n k8s.io images import harvester.tar
+$ rm harvester.tar
+```
+
+:::
+
+- Find the missing images on that node from the other nodes, then export the images from the node where the images still exist and import them on the missing node.
+
+To prevent this from happening, we recommend cleaning up unused container images from the previous version after each successful Harvester upgrade if the image store disk space is stressed. We provided a [harv-purge-images script](https://github.com/harvester/upgrade-helpers/blob/main/bin/harv-purge-images.sh) that makes cleaning up disk space easy, especially for container image storage. The script has to be executed on each Harvester node. For example, if the cluster was originally in v1.1.2, and now it gets upgraded to v1.2.0, you can do the following to discard the container images that are only used in v1.1.2 but no longer needed in v1.2.0:
+
+```shell
+# on each node
+$ ./harv-purge-images.sh v1.1.2 v1.2.0
+```
+
+:::caution
+
+- The script only downloads the image lists and compares the two to calculate the difference between the two versions. It does not communicate with the cluster and, as a result, does not know what version the cluster was upgraded from.
+- We published image lists for each version released since v1.1.0. For clusters older than v1.1.0, you have to clean up the old images manually.
+
+:::
diff --git a/versioned_docs/version-v1.1/faq.md b/versioned_docs/version-v1.1/faq.md
@@ -48,3 +48,57 @@ New password for default administrator (user-xxxxx):
 ### I added an additional disk with partitions. Why is it not getting detected?
 
 As of Harvester v1.0.2, we no longer support adding additional partitioned disks, so be sure to delete all partitions first (e.g., using `fdisk`).
+
+### Why are there some Harvester pods that become ErrImagePull/ImagePullBackOff?
+
+This is likely because your Harvester cluster is an air-gapped setup, and some pre-loaded container images are missing. Kubernetes has a mechanism that does garbage collection against bloated image stores. When the partition which stores container images is over 85% full, `kubelet` tries to prune the images based on the last time they were used, starting with the oldest, until the occupancy is lower than 80%. These numbers (85% and 80%) are default High/Low thresholds that come with Kubernetes.
+
+To recover from this state, do one of the following depending on the cluster's configuration:
+- Pull the missing images from sources outside of the cluster (if it's an air-gapped environment, you might need to set up an HTTP proxy beforehand).
+- Manually import the images from the Harvester ISO image.
+
+:::note
+
+Take v1.1.2 as an example, download the Harvester ISO image from the official URL. Then extract the image list from the ISO image to decide which image tarball we're going to import. For instance, we want to import the missing container image `rancher/harvester-upgrade`
+
+```shell
+$ curl -sfL https://releases.rancher.com/harvester/v1.1.2/harvester-v1.1.2-amd64.iso -o harvester.iso
+
+$ xorriso -osirrox on -indev harvester.iso -extract /bundle/harvester/images-lists images-lists
+
+$ grep -R "rancher/harvester-upgrade" images-lists/
+images-lists/harvester-images-v1.1.2.txt:docker.io/rancher/harvester-upgrade:v1.1.2
+```
+
+Find out the location of the image tarball, and extract it from the ISO image. Decompress the extracted zstd image tarball.
+
+```shell
+$ xorriso -osirrox on -indev harvester.iso -extract /bundle/harvester/images/harvester-images-v1.1.2.tar.zst harvester.tar.zst
+
+$ zstd -d --rm harvester.tar.zst
+```
+
+Upload the image tarball to the Harvester nodes that need recover. Finally, execute the following command to import the container images on each of them.
+
+```shell
+$ ctr -n k8s.io images import harvester.tar
+$ rm harvester.tar
+```
+
+:::
+
+- Find the missing images on that node from the other nodes, then export the images from the node where the images still exist and import them on the missing node.
+
+To prevent this from happening, we recommend cleaning up unused container images from the previous version after each successful Harvester upgrade if the image store disk space is stressed. We provided a [harv-purge-images script](https://github.com/harvester/upgrade-helpers/blob/main/bin/harv-purge-images.sh) that makes cleaning up disk space easy, especially for container image storage. The script has to be executed on each Harvester node. For example, if the cluster was originally in v1.1.1, and now it gets upgraded to v1.1.2, you can do the following to discard the container images that are only used in v1.1.1 but no longer needed in v1.1.2:
+
+```shell
+# on each node
+$ ./harv-purge-images.sh v1.1.1 v1.1.2
+```
+
+:::caution
+
+- The script only downloads the image lists and compares the two to calculate the difference between the two versions. It does not communicate with the cluster and, as a result, does not know what version the cluster was upgraded from.
+- We published image lists for each version released since v1.1.0. For clusters older than v1.1.0, you have to clean up the old images manually.
+
+:::