From 0ac42d9de0b3e00b2bdb4072777b623a7a3da442 Mon Sep 17 00:00:00 2001
From: Zespre Chang <zespre.chang@suse.com>
Date: Mon, 10 Jul 2023 14:32:31 +0800
Subject: [PATCH] Add section about image cleaning for disk space

Signed-off-by: Zespre Chang <zespre.chang@suse.com>
---
 docs/faq.md                        | 20 ++++++++++++++++++++
 versioned_docs/version-v1.1/faq.md | 20 ++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/docs/faq.md b/docs/faq.md
index f5e5b85fb3f..5291809a42d 100644
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -48,3 +48,23 @@ New password for default administrator (user-xxxxx):
 ### I added an additional disk with partitions. Why is it not getting detected?
 
 As of Harvester v1.0.2, we no longer support adding additional partitioned disks, so be sure to delete all partitions first (e.g., using `fdisk`).
+
+### Why are there some Harvester pods that become ErrImagePull/ImagePullBackOff?
+
+This is likely because your Harvester cluster is an air-gapped setup, and some pre-loaded container images are missing. Kubernetes has a mechanism that does garbage collection against bloated image stores. When the partition which stores container images is over 85% full, `kubelet` will try to prune some least used images to save disk space until the occupancy is lower than 80%. These numbers (85% and 80%) are default High/Low thresholds that come with Kubernetes.
+
+To recover from the state, either pull the missing images from the Internet (by setting up an HTTP proxy if it's an air-gapped environment) or manually import the images from the ISO image. For multi-node clusters, chances are the missing container images on one node could be found on the other nodes. If that's the case, you can export the images from the node that still has them and import them on the missing one.
+
+To prevent such things from happening, please try to clean up unused container images from the previous version each time after a successful Harvester upgrade if the image store disk space is stressed. We provided a handy script that helps clean up disk space, especially for container storage, which can be found [here](https://github.com/harvester/upgrade-helpers/blob/main/bin/harv-purge-images.sh). The script has to be executed on each Harvester node. For example, if the cluster was originally in v1.1.2, and now it gets upgraded to v1.2.0, you can do the following to discard the container images that are only used in v1.1.2 but no longer needed in v1.2.0:
+
+```shell
+# on each node
+$ ./harv-purge-images.sh v1.1.2 v1.2.0
+```
+
+:::caution
+
+- The script only downloads the image lists and compares the two to calculate the diff between the two versions. It does not check with the cluster and does not know what version the cluster was upgraded from.
+- We published image lists for each version released since v1.1.0. For clusters older than v1.1.0, users have to clean up the old images manually.
+
+:::
diff --git a/versioned_docs/version-v1.1/faq.md b/versioned_docs/version-v1.1/faq.md
index 0affe008cca..9956d711778 100644
--- a/versioned_docs/version-v1.1/faq.md
+++ b/versioned_docs/version-v1.1/faq.md
@@ -48,3 +48,23 @@ New password for default administrator (user-xxxxx):
 ### I added an additional disk with partitions. Why is it not getting detected?
 
 As of Harvester v1.0.2, we no longer support adding additional partitioned disks, so be sure to delete all partitions first (e.g., using `fdisk`).
+
+### Why are there some Harvester pods that become ErrImagePull/ImagePullBackOff?
+
+This is likely because your Harvester cluster is an air-gapped setup, and some pre-loaded container images are missing. Kubernetes has a mechanism that does garbage collection against bloated image stores. When the partition which stores container images is over 85% full, `kubelet` will try to prune some least used images to save disk space until the occupancy is lower than 80%. These numbers (85% and 80%) are default High/Low thresholds that come with Kubernetes.
+
+To recover from the state, either pull the missing images from the Internet (by setting up an HTTP proxy if it's an air-gapped environment) or manually import the images from the ISO image. For multi-node clusters, chances are the missing container images on one node could be found on the other nodes. If that's the case, you can export the images from the node that still has them and import them on the missing one.
+
+To prevent such things from happening, please try to clean up unused container images from the previous version each time after a successful Harvester upgrade if the image store disk space is stressed. We provided a handy script that helps clean up disk space, especially for container storage, which can be found [here](https://github.com/harvester/upgrade-helpers/blob/main/bin/harv-purge-images.sh). The script has to be executed on each Harvester node. For example, if the cluster was originally in v1.1.1, and now it gets upgraded to v1.1.2, you can do the following to discard the container images that are only used in v1.1.1 but no longer needed in v1.1.2:
+
+```shell
+# on each node
+$ ./harv-purge-images.sh v1.1.1 v1.1.2
+```
+
+:::caution
+
+- The script only downloads the image lists and compares the two to calculate the diff between the two versions. It does not check with the cluster and does not know what version the cluster was upgraded from.
+- We published image lists for each version released since v1.1.0. For clusters older than v1.1.0, users have to clean up the old images manually.
+
+:::