Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet no longer listening on read only port 10255 #128

Closed
jesseshieh opened this issue Dec 20, 2018 · 11 comments
Closed

Kubelet no longer listening on read only port 10255 #128

jesseshieh opened this issue Dec 20, 2018 · 11 comments

Comments

@jesseshieh
Copy link

What happened:
I have an EKS 1.10 cluster with worker nodes running 1.10.3 and everything is great. I decided to create a new worker group today with the 1.10.11 ami. Everything is great except it seems the kubelet on the 1.10.11 nodes is no longer listening on the read only port 10255. I verified this with netstat -l as well as lsof -p $kubelet_pid | grep -i listen. I also verified in journalctl -u kubelet that it does not even attempt to listen on port 10255.

I manually modified /etc/systemd/system/kubelet.service and added --read-only-port 10255 and that fixes it, but I thought 10255 was the default and didn't need to be set. I verified that on my 1.10.3 workers, --read-only-port is not set, but kubelet is indeed listening on 10255.

I noticed that in 1.10.11 kubelet loads its configuration from a file instead of all from flags. My guess is that this transition somehow changes the default values. I dug around a bit in the kubelet code to figure out how defaults are determined, but am coming up short so far.

What you expected to happen:
kubelet to listen on the read only port 10255

How to reproduce it (as minimally and precisely as possible):
I think if you just start up a 1.10.11 worker node, you can see it by running something like curl localhost:10255/stats/summary

Anything else we need to know?:

Environment:

  • AWS Region: us-west-2
  • Instance Type(s): t2.2xlarge
  • EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion): eks.3
  • Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version): 1.10
  • AMI Version: 1.10.11
  • Kernel (e.g. uname -a): Linux REDACTED 4.14.77-81.59.amzn2.x86_64 Template is missing source_ami_id in the variables section #1 SMP Mon Nov 12 21:32:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Release information (run cat /tmp/release on a node): file does not exist
@micahhausler
Copy link
Member

Hey thanks for the PR and Issue. You are welcome to build an AMI with this setting configured, but the feature is deprecated:

@szymonpk
Copy link

szymonpk commented Dec 31, 2018

@micahhausler Unfortunately, it is a breaking change without any notice and major version change. It also contradicts your official Kubernetes dashboard documentation, where it still suggested to setup heapster with --source=kubernetes:https://kubernetes.default. It does not work well and in result we got an error on new nodes:

E1228 12:13:05.074233       1 manager.go:101] Error in scraping containers from kubelet:10.0.30.39:10255: failed to get all container stats from Kubelet URL "http://10.0.30.39:10255/stats/container/": Post http://10.0.30.39:10255/stats/container/: dial tcp 10.0.30.39:10255: getsockopt: connection refused

For anyone who wants to fix dashboard deployment with minimal effort, apply this patch:

spec:
  template:
    spec:
      containers:
      - command:
        - /heapster
        - --source=kubernetes:kubernetes:https://kubernetes.default?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250&insecure=true
        - --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086
        name: heapster

with following command:

kubectl patch deployment heapster --namespace kube-system --patch "$(cat <path to patch>.yaml)"

And create role which will allow heapster to operate and bind it to heapster service account:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: node-stats-full
rules:
- apiGroups: [""]
  resources: ["nodes/stats"]
  verbs: ["get", "watch", "list", "create"]
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: heapster-node-stats
subjects:
- kind: ServiceAccount
  name: heapster
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: node-stats-full
  apiGroup: rbac.authorization.k8s.io

Solution was originally provided by @mrwulf.

@bobhenkel
Copy link

bobhenkel commented Jan 11, 2019

I'm also pretty sure that breaking heapster also breaks aws node autoscaler as it uses data from it.

Thanks to @mrwulf and @szymonpk I was able to get our heapster working again.

@tklovett
Copy link

I wanted to provide a little more color in case anyone else comes by with the same issue we had.

tl;dr Pull Request #90 introduced some unannounced breaking changes which conflicted with the defaults defined by the prometheus-operator helm chart. This resulted in alerts reporting that Kubelet and other core K8s components were down.

As required to avoid deprecation issues, EKS moved from using kubelet's command line arguments to a config file. However, the default values used in the file-based configuration are actually different. From the kubectl docs:

Note that some default values differ between command-line flags and the Kubelet config file. If --config is provided and the values are not specified via the command line, the defaults for the KubeletConfiguration version apply.

This resulted in a change to the value of kubelet's ReadOnlyPort, from the default of --read-only-port 10255 to the default of readOnlyPort: 0. So in updating from amazon-eks-ami v25 to v20190109, our kubelet ReadOnlyPort was disabled.

By default, the Helm Chart for Prometheus Operator uses the ReadOnlyPort (aka http-metrics) to scrape various metrics. To fix, we set kubelet.serviceMonitor.https: true, to enable usage of the default authenticated port 10250

@philwinder
Copy link

For future googlers, I managed to find a workaround for the prometheus-operator that doesn't require rebuilding the AMI: prometheus-operator/prometheus-operator#867 (comment)

It uses relabelling to request metrics via the main k8s proxy, rather than attempting to go to the kubelet. The main benefit is that you don't have to edit the EKS AMI to change the authentication settings. The downside is more load on the proxy.

@erez-rabih
Copy link

@tklovett thanks for the summary of this issue - much appreciated

@gacopl
Copy link

gacopl commented Feb 14, 2019

So in other words i understand deprecation of kubelet flags what i dont understand that anyone responsible for PR#90 did not make sure defaults are the same and if not - provide them in config file

@gacopl
Copy link

gacopl commented Feb 15, 2019

For me easiest option for now was to add --read-only-port=10255 argument to bootstrap.sh in Launch Configuration user-data, shame i have to rebootstrap all my clusters...................... again

@cabrinha
Copy link

For me easiest option for now was to add --read-only-port=10255 argument to bootstrap.sh in Launch Configuration user-data, shame i have to rebootstrap all my clusters...................... again

This is what worked for me on the latest EKS optimized AMI.

@davidham
Copy link

Based on my read of the issues @micahhausler linked to, @tklovett's response above is correct, and that we should not try to force a read-only port. It looks like the reason for the change in the first place is that the K8s maintainers wanted to deprecate the read-only port for security reasons. Setting kubelet.serviceMonitor.https: true worked for me with no change to the AMI.

@sdwerwed
Copy link

sdwerwed commented Mar 9, 2021

Find what port your kubelet endpoint is using and if it is HTTPS or HTTP

For Prometheus for EKS 1.19 it is using https, I made it work with this change in values.yaml for prometheus helmcharts

    kubelet:
      enabled: true
      serviceMonitor:
        https: true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment