Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instance failed to join status despite that instance is actually joined #719

Closed
a13x5 opened this issue Aug 2, 2021 · 5 comments
Closed

Comments

@a13x5
Copy link

a13x5 commented Aug 2, 2021

What happened:
When creating eks node group with launch template with custom userdata it fails with "Instance failed to join the kubernetes cluster" error. But kubectl get node command shows nodes created by the node group.

What you expected to happen:
Node successfully joins the cluster.

How to reproduce it (as minimally and precisely as possible):
Create node group using launch template with custom userdata script

echo 'Custom user-data script'
/etc/eks/bootstrap.sh tst \
--kubelet-extra-args '--max-pods=100' \
--b64-cluster-ca <REDACTED> \
--apiserver-endpoint https://<REDACTED>.us-west-2.eks.amazonaws.com \
--use-max-pods false

Anything else we need to know?:
Without using launch template it works fine.
All resources were created using terraform code available in a gist
Environment:

  • AWS Region: us-west-2
  • Instance Type(s): m5a.large
  • EKS Platform version: eks.5
  • Kubernetes version: 1.19
  • AMI Version: amazon-eks-node-1.19-v20210512
  • Kernel (e.g. uname -a): Linux ip-10-0-5-52.us-west-2.compute.internal 5.4.110-54.189.amzn2.x86_64 Template is missing source_ami_id in the variables section #1 SMP Mon Apr 26 21:25:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Release information (run cat /etc/eks/release on a node):
BASE_AMI_ID="ami-004a571bc4ab7023a"
BUILD_TIME="Wed May 12 16:45:14 UTC 2021"
BUILD_KERNEL="5.4.110-54.189.amzn2.x86_64"
ARCH="x86_64"

@ravisinha0506
Copy link
Contributor

Hi @Alex-Sizov,

  • Is this an intermittent behavior or happens every time?
  • Could you share the node group ARN, so that we can debug this from our end?
  • Could you share the labels you see on the node?

@a13x5
Copy link
Author

a13x5 commented Aug 3, 2021

Hi @ravisinha0506 !

  • Yes this happens every time. i tried 4 times and each time all the same
  • Node group ARN is arn:aws:eks:us-west-2:560065381221:nodegroup/tst/tst/72bd78af-c09c-18bb-3796-c35618fa4130
  • Labels on node are following (from kubectl get node):
  labels:                                                                                     
    beta.kubernetes.io/arch: amd64                                                            
    beta.kubernetes.io/instance-type: m5a.large                                                                                                                                             
    beta.kubernetes.io/os: linux                                                                                                                                                            
    failure-domain.beta.kubernetes.io/region: us-west-2                                                                                                                                     
    failure-domain.beta.kubernetes.io/zone: us-west-2a                                                                                                                                      
    kubernetes.io/arch: amd64                                                                                                                                                               
    kubernetes.io/hostname: ip-10-0-5-52.us-west-2.compute.internal                                                                                                                         
    kubernetes.io/os: linux                                                                   
    node.kubernetes.io/instance-type: m5a.large                                               
    topology.kubernetes.io/region: us-west-2                                                  
    topology.kubernetes.io/zone: us-west-2a 

@deepanverma19
Copy link

deepanverma19 commented Aug 4, 2021

@ravisinha0506 I am also facing the same issue as @Alex-Sizov +1

@suket22
Copy link
Member

suket22 commented Aug 10, 2021

@Alex-Sizov

When creating a Managed node group with a launch template, the behavior differs based on whether an AMI has been specified in the launch template or not.

When no AMI is present in the launch template (as is the case for you, if I'm reading your gist correctly), EKS will merge in a section of MIME multi-part user data to the user data contents you've passed in. The part EKS merges in will attempt to bootstrap your worker node as well. Since MIME multiparts are executed in order, this means your bootstrapping happens first and the EKS bootstrapping becomes a no-op.

As a result, your worker nodes don't have the required labels for EKS to associate them with a node group.

You can fix this by specifying the worker AMI you'd like to use within your launch template and pass that to EKS. See this documentation for more details.

@a13x5
Copy link
Author

a13x5 commented Aug 23, 2021

I've removed release_version parameter from eks-node and added image_id to my launch template. And it's all work just fine now. I tested cluster creation and updating.
Thank you very much @suket22 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants