Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Timed Out Problem with Edge Cluster Agent Installation #4105

Open
szeyan543 opened this issue Jul 9, 2024 · 4 comments
Open

Bug: Timed Out Problem with Edge Cluster Agent Installation #4105

szeyan543 opened this issue Jul 9, 2024 · 4 comments
Assignees

Comments

@szeyan543
Copy link

Describe the bug.

During the edge cluster installation, the agent deployment process times out after creating the necessary resources. The installation sequence proceeds as expected up to the point where it waits for the agent deployment to complete, but it fails to do so within the allocated time frame. This results in a timeout error.

2024-07-09 11:31:23 cronjob auto-upgrade-cronjob created
2024-07-09 11:31:24 persistentvolumeclaim/openhorizon-agent-pvc created
2024-07-09 11:31:24 persistent volume claim created
2024-07-09 11:31:25 deployment.apps/agent created
2024-07-09 11:31:25 Waiting up to 300 seconds for the agent deployment to complete...
error: timed out waiting for the condition

Describe the steps to reproduce the behavior.

No response

Expected behavior.

No response

Screenshots.

No response

Operating Environment

Linux

Additional Information

No response

@szeyan543 szeyan543 added the bug label Jul 9, 2024
@szeyan543
Copy link
Author

@joewxboy

@dlarson04
Copy link
Contributor

responded with the following in the LFEdge messaging app

hi
The only time I have seen this is when something goes wrong with the persistentvolumeclaims. Please issue the following commands and paste the results

kubectl get storageclasses

and

kubectl get persistentvolumeclaims -A

and

kubectl -n <namespace_name> get deploy/agent -o=jsonpath='{$.spec.template.spec.containers[*].image}'; echo ""

@szeyan543
Copy link
Author

Hello, here are the results from the commands:

  1. root@k:~# kubectl get storageclasses
NAME                   PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-path (default)   rancher.io/local-path   Delete          WaitForFirstConsumer   false                  14h
  1. root@k:~# kubectl get persistentvolumeclaims -A
NAMESPACE           NAME                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
default             docker-registry-pvc     Bound    pvc-f1e564ad-ee0c-49a7-b688-184a06064550   10Gi       RWO            local-path     <unset>                 14h
openhorizon-agent   openhorizon-agent-pvc   Bound    pvc-dcd879b7-8d93-4025-982e-8f8582cf6eee   10Gi       RWO            local-path     <unset>                 5m31s
  1. root@k:~# kubectl -n openhorizon-agent get deploy/agent -o=jsonpath='{$.spec.template.spec.containers[*].image}'; echo ""
    10.43.195.246:5000/openhorizon-agent/amd64_anax_k8s:latest

@dlarson04
Copy link
Contributor

We debugged this and there was mismatch between HTTP and HTTPS on the k3s cluster... Fixing the cluster resolved the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants