Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Helm,Nginx,dittoui]: OOMKilled on Ditto 3.3.0 #1663

Closed
Altair-Bueno opened this issue Jun 23, 2023 · 4 comments · Fixed by #1667
Closed

[Helm,Nginx,dittoui]: OOMKilled on Ditto 3.3.0 #1663

Altair-Bueno opened this issue Jun 23, 2023 · 4 comments · Fixed by #1667
Milestone

Comments

@Altair-Bueno
Copy link
Contributor

# values.yml
global:
  prometheus:
    enabled: true

gateway:
  devopsPassword: foobar

mongodb:
  persistence:
    enabled: false
helm install --create-namespace dt-ditto oci://registry-1.docker.io/eclipse/ditto -f values.yml --version=3.3.0
dt-ditto-connectivity-56f78b486d-8r692   1/1     Running            0          2m10s
dt-ditto-dittoui-9f875458c-fkcq7         0/1     CrashLoopBackOff   4          2m10s
dt-ditto-gateway-565d4f4f6c-4z7tp        1/1     Running            0          2m10s
dt-ditto-mongodb-df4ffd99d-mps8x         1/1     Running            0          2m10s
dt-ditto-nginx-566ff8d696-c75dg          0/1     CrashLoopBackOff   3          2m10s
dt-ditto-policies-657c989997-p88v2       1/1     Running            0          2m10s
dt-ditto-swaggerui-77f4864955-2vfwm      1/1     Running            0          2m10s
dt-ditto-things-5886d99c7b-tgjn8         1/1     Running            0          2m10s
dt-ditto-thingssearch-654556bb8b-ml9hn   1/1     Running            0          2m10s

Changing the nginx.resources.memoryMi to 500 stops the pod from crashing. However, as an user of the Helm chart I would expect a sensible default (currently is set to only 64 megabytes!!)

Logs nginx

Defaulted container "ditto-nginx" out of: ditto-nginx, wait-for-gateway (init)
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up

Logs dittoui

/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf differs from the packaged version
/docker-entrypoint.sh: Ignoring /docker-entrypoint.d/15-local-resolvers.envsh, not executable
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2023/06/23 10:18:04 [notice] 1#1: using the "epoll" event method
2023/06/23 10:18:04 [notice] 1#1: nginx/1.25.1
2023/06/23 10:18:04 [notice] 1#1: built by gcc 12.2.1 20220924 (Alpine 12.2.1_git20220924-r4) 
2023/06/23 10:18:04 [notice] 1#1: OS: Linux 5.4.0-113-generic
2023/06/23 10:18:04 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2023/06/23 10:18:04 [notice] 1#1: start worker processes
2023/06/23 10:18:04 [notice] 1#1: start worker process 29
2023/06/23 10:18:04 [notice] 1#1: start worker process 30
2023/06/23 10:18:04 [notice] 1#1: start worker process 31
2023/06/23 10:18:04 [notice] 1#1: start worker process 32
2023/06/23 10:18:04 [notice] 1#1: start worker process 33
2023/06/23 10:18:04 [notice] 1#1: start worker process 34
2023/06/23 10:18:04 [notice] 1#1: start worker process 35
2023/06/23 10:18:04 [notice] 1#1: start worker process 36
2023/06/23 10:18:04 [notice] 1#1: start worker process 37
2023/06/23 10:18:04 [notice] 1#1: start worker process 38
2023/06/23 10:18:04 [notice] 1#1: start worker process 39
2023/06/23 10:18:04 [notice] 1#1: start worker process 40
2023/06/23 10:18:04 [notice] 1#1: start worker process 41
2023/06/23 10:18:04 [notice] 1#1: start worker process 42
2023/06/23 10:18:04 [notice] 1#1: start worker process 43
2023/06/23 10:18:04 [notice] 1#1: start worker process 44
2023/06/23 10:18:04 [notice] 1#1: start worker process 45
2023/06/23 10:18:04 [notice] 1#1: start worker process 46
2023/06/23 10:18:04 [notice] 1#1: start worker process 47
2023/06/23 10:18:04 [notice] 1#1: start worker process 48
2023/06/23 10:18:04 [notice] 1#1: start worker process 49
2023/06/23 10:18:04 [notice] 1#1: start worker process 50
2023/06/23 10:18:04 [notice] 1#1: start worker process 51
2023/06/23 10:18:04 [notice] 1#1: start worker process 52
2023/06/23 10:18:04 [notice] 1#1: start worker process 53
2023/06/23 10:18:04 [notice] 1#1: start worker process 54
2023/06/23 10:18:04 [notice] 1#1: start worker process 55
2023/06/23 10:18:04 [notice] 1#1: start worker process 56
2023/06/23 10:18:04 [notice] 1#1: start worker process 57
2023/06/23 10:18:04 [notice] 1#1: start worker process 58
2023/06/23 10:18:04 [notice] 1#1: start worker process 59
2023/06/23 10:18:04 [notice] 1#1: start worker process 60
2023/06/23 10:18:04 [notice] 1#1: start worker process 61
2023/06/23 10:18:04 [notice] 1#1: start worker process 62
2023/06/23 10:18:04 [notice] 1#1: start worker process 63
2023/06/23 10:18:04 [notice] 1#1: start worker process 64
2023/06/23 10:18:04 [notice] 1#1: start worker process 65
2023/06/23 10:18:04 [notice] 1#1: start worker process 66
2023/06/23 10:18:04 [notice] 1#1: start worker process 67
2023/06/23 10:18:04 [notice] 1#1: start worker process 68
2023/06/23 10:18:04 [notice] 1#1: start worker process 69
2023/06/23 10:18:04 [notice] 1#1: start worker process 70
2023/06/23 10:18:04 [notice] 1#1: start worker process 71
2023/06/23 10:18:04 [notice] 1#1: start worker process 72
2023/06/23 10:18:04 [notice] 1#1: start worker process 73
2023/06/23 10:18:04 [notice] 1#1: start worker process 74
2023/06/23 10:18:04 [notice] 1#1: start worker process 75
2023/06/23 10:18:04 [notice] 1#1: start worker process 76
2023/06/23 10:18:04 [notice] 1#1: start worker process 77
2023/06/23 10:18:04 [notice] 1#1: start worker process 78
2023/06/23 10:18:04 [notice] 1#1: start worker process 79
2023/06/23 10:18:04 [notice] 1#1: start worker process 80
2023/06/23 10:18:04 [notice] 1#1: start worker process 81
2023/06/23 10:18:04 [notice] 1#1: start worker process 82
2023/06/23 10:18:04 [notice] 1#1: start worker process 83
2023/06/23 10:18:04 [notice] 1#1: start worker process 84
2023/06/23 10:18:04 [notice] 1#1: start worker process 85
2023/06/23 10:18:04 [notice] 1#1: start worker process 86
2023/06/23 10:18:04 [notice] 1#1: start worker process 87
2023/06/23 10:18:04 [notice] 1#1: start worker process 88
2023/06/23 10:18:04 [notice] 1#1: start worker process 89
2023/06/23 10:18:04 [notice] 1#1: start worker process 90
2023/06/23 10:18:04 [notice] 1#1: start worker process 91
2023/06/23 10:18:04 [notice] 1#1: start worker process 92
2023/06/23 10:18:04 [notice] 1#1: start worker process 93
2023/06/23 10:18:04 [notice] 1#1: start worker process 94
2023/06/23 10:18:04 [notice] 1#1: start worker process 95
2023/06/23 10:18:04 [notice] 1#1: start worker process 96
2023/06/23 10:18:04 [notice] 1#1: start worker process 97

@thjaeckle
Copy link
Member

Changing the nginx.resources.memoryMi to 500 stops the pod from crashing. However, as an user of the Helm chart I would expect a sensible default (currently is set to only 64 megabytes!!)

I think what probably causes this is that the worker_processes are configured (by default) to auto in nginx - and that takes the amount of available CPUs into consideration.
As I did always run Ditto on quite small worker-nodes (with max. 4 or 8 CPUs), for that 64mB of memory is sufficient.

For the dittoui we did not adjust the worker_processes, for the other nginx we did however configure them to be 1.

Also found this explanation here: kubernetes/ingress-nginx#8166 (comment)

You probably have a lot more CPUs available (also according to the dittoui logs.
So you would need to adjust the values to your needs, if you need to configure the used worker_processes, please provide a PR and we will surely add it to the official chart.

@thjaeckle
Copy link
Member

What also could be an issue (to be fixed in Ditto's chart) is that the dittoui and the nginx did not configure cpu limits, but only requests.
That probably is used for calculating the worker_processes of nginx.

Will look into that ..

@Altair-Bueno
Copy link
Contributor Author

You probably have a lot more CPUs available (also according to the dittoui logs.

Indeed. Each node has more than 100 cores

That probably is used for calculating the worker_processes of nginx.

I think so. Version 3.2.0 worked fine for me on the same cluster, and it used the max requests setting. I also found this

For now I think bumping the memory is the easiest way to patch.

@thjaeckle
Copy link
Member

For now I think bumping the memory is the easiest way to patch.

Yes, however that will not be done as default value in the Ditto chart.
As when eg running on a 4 core node, 64 MB is totally sufficient.

Another option would be to limit the amount of workers by default in the chart, maybe even make them configurable.

thjaeckle added a commit that referenced this issue Jun 26, 2023
…s when deploying Helm chart to worker with many CPUs

* configure the "dittoui"'s nginx to use 1 worker process (it only serves static content)
* add configuration option in values.yaml to configure the nginx's used worker_processes and default to 4

Signed-off-by: Thomas Jäckle <thomas.jaeckle@beyonnex.io>
thjaeckle added a commit that referenced this issue Jun 26, 2023
…s when deploying Helm chart to worker with many CPUs

* configure the "dittoui"'s nginx to use 1 worker process (it only serves static content)
* add configuration option in values.yaml to configure the nginx's used worker_processes and default to 4

Signed-off-by: Thomas Jäckle <thomas.jaeckle@beyonnex.io>
thjaeckle added a commit that referenced this issue Jun 26, 2023
…s when deploying Helm chart to worker with many CPUs

* configure the "dittoui"'s nginx to use 1 worker process (it only serves static content)
* add configuration option in values.yaml to configure the nginx's used worker_processes and default to 4

Signed-off-by: Thomas Jäckle <thomas.jaeckle@beyonnex.io>
thjaeckle added a commit that referenced this issue Jun 26, 2023
…s when deploying Helm chart to worker with many CPUs

* configure the "dittoui"'s nginx to use 1 worker process (it only serves static content)
* add configuration option in values.yaml to configure the nginx's used worker_processes and default to 4

Signed-off-by: Thomas Jäckle <thomas.jaeckle@beyonnex.io>
thjaeckle added a commit that referenced this issue Jun 27, 2023
…orkers

#1663 fix that nginx's worker_processes setting 'auto' causes problems
@thjaeckle thjaeckle added this to the 3.3.1 milestone Jun 28, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants