Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record memory usage after garbage collection #6963

Merged
merged 5 commits into from
Nov 8, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@
* process.runtime.jvm.memory.init{type="heap",pool="G1 Eden Space"} 1000000
* process.runtime.jvm.memory.usage{type="heap",pool="G1 Eden Space"} 2500000
* process.runtime.jvm.memory.committed{type="heap",pool="G1 Eden Space"} 3000000
* process.runtime.jvm.memory.max{type="heap",pool="G1 Eden Space"} 4000000
* process.runtime.jvm.memory.limit{type="heap",pool="G1 Eden Space"} 4000000
* process.runtime.jvm.memory.usage_after_gc{type="heap",pool="G1 Eden Space"} 1500000
* process.runtime.jvm.memory.init{type="non_heap",pool="Metaspace"} 200
* process.runtime.jvm.memory.usage{type="non_heap",pool="Metaspace"} 400
* process.runtime.jvm.memory.committed{type="non_heap",pool="Metaspace"} 500
Expand All @@ -61,30 +62,41 @@ static void registerObservers(OpenTelemetry openTelemetry, List<MemoryPoolMXBean
.upDownCounterBuilder("process.runtime.jvm.memory.usage")
.setDescription("Measure of memory used")
.setUnit("By")
.buildWithCallback(callback(poolBeans, MemoryUsage::getUsed));
.buildWithCallback(callback(poolBeans, MemoryPoolMXBean::getUsage, MemoryUsage::getUsed));

meter
.upDownCounterBuilder("process.runtime.jvm.memory.init")
.setDescription("Measure of initial memory requested")
.setUnit("By")
.buildWithCallback(callback(poolBeans, MemoryUsage::getInit));
.buildWithCallback(callback(poolBeans, MemoryPoolMXBean::getUsage, MemoryUsage::getInit));

meter
.upDownCounterBuilder("process.runtime.jvm.memory.committed")
.setDescription("Measure of memory committed")
.setUnit("By")
.buildWithCallback(callback(poolBeans, MemoryUsage::getCommitted));
.buildWithCallback(
callback(poolBeans, MemoryPoolMXBean::getUsage, MemoryUsage::getCommitted));

meter
.upDownCounterBuilder("process.runtime.jvm.memory.limit")
.setDescription("Measure of max obtainable memory")
.setUnit("By")
.buildWithCallback(callback(poolBeans, MemoryUsage::getMax));
.buildWithCallback(callback(poolBeans, MemoryPoolMXBean::getUsage, MemoryUsage::getMax));

meter
.upDownCounterBuilder("process.runtime.jvm.memory.usage_after_gc")
Copy link
Contributor

@rapphil rapphil Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reporting this metric as absolute value might not be helpful when you have a fleet of hosts each with JVMs with different max heap sizes. I understand that you wanted to align with JMX metric conventions but I think there is more value in a normalized metric. Maybe we can add an extra metric? process.runtime.jvm.memory.usage_after_gc_percentage?

Calculating the percentage after the metric is emitted will not be trivial or feasible in all backends for the case that you want to analyze data from multiple hosts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calculating the percentage after the metric is emitted will not be trivial or feasible in all backends for the case that you want to analyze data from multiple hosts.

Is this really a challenge for backends? Feels like doing operations across metrics like dividing one by another is pretty standard, and probably table stakes for a metrics backend. For example, if you exported these metrics to prometheus its trivial to compute a percentage with:

(process_runtime_jvm_memory_usage_after_gc / process_runtime_jvm_memory_limit) * 100

If we buy this argument wouldn't we also want to report the percentage for usage and committed as well? And if we report percentage instead of of the absolute value, doesn't that stand frustrate people on the other side that want to analyze their absolute figures instead of relative percentages?

Another argument against reporting utilization is that we don't actually have the data to report utilization because not all memory pools report memory usage after gc, and not all memory pools report a limit. Here's the sets of pool attribute values that are reported for usage, usage_after_gc, and limit for an app running with g1 garbage collector:

  • process.runtime.jvm.memory.usage reports values for 8 pools: CodeHeap 'non-nmethods', Code Heap 'non-profiled nmethods', CodeHeap 'profiled nmethods', Compressed Class Space, G1 Eden Space, G1 Old Gen, G1 Survivor Space, Metaspace
  • process.runtime.jvm.memory.usage_after_gc reports values for 3 pools: G1 Eden Space, G1 Old Gen, G1 Survivor Space
  • process.runtime.jvm.memory.limit reports values for 5 pools: CodeHeap 'non-nmethods', Code Heap 'non-profiled nmethods', CodeHeap 'profiled nmethods', G1 Old Gen

Notice how there's not usage_after_gc or limit values for all the pools. Reporting utilization would limit the set of pools to those that have both usage_after_gc and limit values, which is only G1 Old Gen. Same argument applies to reporting utilization instead of usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a challenge for backends? Feels like doing operations across metrics like dividing one by another is pretty standard, and probably table stakes for a metrics backend. For example, if you exported these metrics to prometheus its trivial to compute a percentage with:

If you have multiple jvms (say hundreds) reporting with different process_runtime_jvm_memory_limit because each jvm is using a different -xmx, how do you do that without manually creating a query that will match every process_runtime_jvm_memory_usage_after_gc to the respective process_runtime_jvm_memory_limit? Moreover what if the attributes that uniquely identify the metrics are not predictable?

Having said that, you made a good point about the lack of consistency in the memory pools. It does not make sense to report a normalized metric per memory pool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover what if the attributes that uniquely identify the metrics are not predictable?

If it's not possible to recognize what instance emitted particular metrics, then I'd say that your data is nonsense anyway - if you can't correlate metric instruments with each other, or with a particular deployment, the telemetry is basically useless.
I think we should operate under an assumption that resource attributes uniquely identify the metrics source - and if this is not true, this is a broader problem that needs to be fixed across the board.

.setDescription(
"Measure of memory used after the most recent garbage collection event on this pool")
.setUnit("By")
.buildWithCallback(
callback(poolBeans, MemoryPoolMXBean::getCollectionUsage, MemoryUsage::getUsed));
}

// Visible for testing
static Consumer<ObservableLongMeasurement> callback(
List<MemoryPoolMXBean> poolBeans, Function<MemoryUsage, Long> extractor) {
List<MemoryPoolMXBean> poolBeans,
Function<MemoryPoolMXBean, MemoryUsage> memoryUsageExtractor,
Function<MemoryUsage, Long> valueExtractor) {
List<Attributes> attributeSets = new ArrayList<>(poolBeans.size());
for (MemoryPoolMXBean pool : poolBeans) {
attributeSets.add(
Expand All @@ -97,7 +109,8 @@ static Consumer<ObservableLongMeasurement> callback(
return measurement -> {
for (int i = 0; i < poolBeans.size(); i++) {
Attributes attributes = attributeSets.get(i);
long value = extractor.apply(poolBeans.get(i).getUsage());
MemoryUsage memoryUsage = memoryUsageExtractor.apply(poolBeans.get(i));
long value = valueExtractor.apply(memoryUsage);
if (value != -1) {
measurement.record(value, attributes);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,11 @@
import org.mockito.Mock;
import org.mockito.Spy;
import org.mockito.junit.jupiter.MockitoExtension;
import org.mockito.junit.jupiter.MockitoSettings;
import org.mockito.quality.Strictness;

@ExtendWith(MockitoExtension.class)
@MockitoSettings(strictness = Strictness.LENIENT)
class MemoryPoolsTest {

@RegisterExtension
Expand All @@ -45,6 +48,8 @@ class MemoryPoolsTest {

@Mock private MemoryUsage heapPoolUsage;
@Mock private MemoryUsage nonHeapUsage;
@Mock private MemoryUsage heapCollectionUsage;
@Mock private MemoryUsage nonHeapCollectionUsage;

private List<MemoryPoolMXBean> beans;

Expand All @@ -53,9 +58,11 @@ void setup() {
when(heapPoolBean.getName()).thenReturn("heap_pool");
when(heapPoolBean.getType()).thenReturn(MemoryType.HEAP);
when(heapPoolBean.getUsage()).thenReturn(heapPoolUsage);
when(heapPoolBean.getCollectionUsage()).thenReturn(heapCollectionUsage);
when(nonHeapPoolBean.getName()).thenReturn("non_heap_pool");
when(nonHeapPoolBean.getType()).thenReturn(MemoryType.NON_HEAP);
when(nonHeapPoolBean.getUsage()).thenReturn(nonHeapUsage);
when(nonHeapPoolBean.getCollectionUsage()).thenReturn(nonHeapCollectionUsage);
beans = Arrays.asList(heapPoolBean, nonHeapPoolBean);
}

Expand All @@ -69,7 +76,8 @@ void registerObservers() {
when(nonHeapUsage.getUsed()).thenReturn(15L);
when(nonHeapUsage.getCommitted()).thenReturn(16L);
when(nonHeapUsage.getMax()).thenReturn(17L);

when(heapCollectionUsage.getUsed()).thenReturn(18L);
when(nonHeapCollectionUsage.getUsed()).thenReturn(19L);
MemoryPools.registerObservers(testing.getOpenTelemetry(), beans);

testing.waitAndAssertMetrics(
Expand Down Expand Up @@ -176,6 +184,33 @@ void registerObservers() {
AttributeKey.stringKey("pool"), "non_heap_pool")
.hasAttribute(
AttributeKey.stringKey("type"), "non_heap")))));
testing.waitAndAssertMetrics(
"io.opentelemetry.runtime-metrics",
"process.runtime.jvm.memory.usage_after_gc",
metrics ->
metrics.anySatisfy(
metricData ->
assertThat(metricData)
.hasInstrumentationScope(EXPECTED_SCOPE)
.hasDescription(
"Measure of memory used after the most recent garbage collection event on this pool")
.hasUnit("By")
.hasLongSumSatisfying(
sum ->
sum.hasPointsSatisfying(
point ->
point
.hasValue(18)
.hasAttribute(
AttributeKey.stringKey("pool"), "heap_pool")
.hasAttribute(AttributeKey.stringKey("type"), "heap"),
point ->
point
.hasValue(19)
.hasAttribute(
AttributeKey.stringKey("pool"), "non_heap_pool")
.hasAttribute(
AttributeKey.stringKey("type"), "non_heap")))));
}

@Test
Expand All @@ -184,7 +219,7 @@ void callback_Records() {
when(nonHeapUsage.getUsed()).thenReturn(2L);

Consumer<ObservableLongMeasurement> callback =
MemoryPools.callback(beans, MemoryUsage::getUsed);
MemoryPools.callback(beans, MemoryPoolMXBean::getUsage, MemoryUsage::getUsed);
callback.accept(measurement);

verify(measurement)
Expand All @@ -199,7 +234,8 @@ void callback_SkipRecord() {
when(heapPoolUsage.getMax()).thenReturn(1L);
when(nonHeapUsage.getMax()).thenReturn(-1L);

Consumer<ObservableLongMeasurement> callback = MemoryPools.callback(beans, MemoryUsage::getMax);
Consumer<ObservableLongMeasurement> callback =
MemoryPools.callback(beans, MemoryPoolMXBean::getUsage, MemoryUsage::getMax);
callback.accept(measurement);

verify(measurement)
Expand Down