Set -XX:MaxRAMPercentage if resource limit set #54
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We started noticing that the Spotinst controller was crash-looping on our largest Kubernetes cluster after we added the new m7a.medium instance types to our Ocean (which have 4GB of memory). The pods were exiting with the message "Terminating due to java.lang.OutOfMemoryError: Java heap space". We switched to using the Helm charts to deploy Spotinst with requests/limits set to avoid node memory contention issues but continued to see the errors unless we increased pod memory limits to the 6GB-8GB range.
Since we deploy Java applications to k8s ourselves we are very familiar with the memory characteristics of JVM applications running in containers and were able to quickly deduce that the JVM was auto-sizing the heap to the default value of 25% of the available memory (the memory limit if set, the host total memory if not). Since this cluster apparently requires a > 1GB heap size for the spotinst controller it would run out of heap space and exit even with 4GB total memory given to it, despite the fact that that memory was only ~50% utilized. To get to a 2GB heap size we would need to run the controller with a memory limit of 8GB, which is incredibly excessive considering it would barely go above ~2.5GB usage with heap + non-heap overhead.
To prevent such excessive memory consumption let's add the
-XX:MaxRAMPercentage
argument to theJAVA_OPTS
environmental variable (which is read by the JVM on startup), but only ifresources.limits.memory
is set to a value. Since we want to ensure that the JVM has enough memory overhead for off-heap let's scale the percent by the memory limit so that pods running with smaller memory limits have a larger amount of memory set aside for off-heap.This PR implements that. It sets the heap percentage to 50% for memory limits less than 512MB, 60% for 512MB-1GB, 70% for 1GB-2GB, and 80% for 2GB+. This is working for us now, we've also tested it with some diagnostic arguments to ensure that the value was indeed getting set in the JVM. The values may need some further tuning in a follow-up, but for now are likely to be sufficient. I tested that the functions are robust against the
.requests
being nil, thedig
function will returnnil
unless the key.resources.limits.memory
actually exists and is set to something and the rest of the function will be skipped ifdig
returnsnil
.