Add tiered stats to request cache response #8

peteralfonsi · 2023-10-23T22:39:53Z

Description

Modifies the request cache's API to return statistics for additional cache tiers, like the upcoming disk tier. Also adds the number of entries to the response. Stats for the existing on-heap tier stayed where they were, in the "request_cache" object. This object has a new field, the "tiers" object. Each new tier, besides the on-heap tier, will have its stats returned here. If a certain tier is not enabled, its statistics will still be returned, with all its values set to 0.

Calling _nodes/stats/indices/request_cache now returns the following:

  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "runTask",
  "nodes": {
    "3Xx_SnhhQACQu5jW9UJGrQ": {
      "timestamp": 1698099482892,
      "name": "runTask-0",
      "transport_address": "127.0.0.1:9300",
      "host": "127.0.0.1",
      "ip": "127.0.0.1:9300",
      "roles": [
        "cluster_manager",
        "data",
        "ingest",
        "remote_cluster_client"
      ],
      "attributes": {
        "testattr": "test",
        "shard_indexing_pressure_enabled": "true"
      },
      "indices": {
        "request_cache": {
          "memory_size_in_bytes": 0,
          "evictions": 0,
          "hit_count": 0,
          "miss_count": 0,
          "entries": 0,
          "tiers": {
            "disk": {
              "memory_size_in_bytes": 0,
              "evictions": 0,
              "hit_count": 0,
              "miss_count": 0,
              "entries": 0
            }
          }
        }
      }
    }
  }
}

Tested with unit tests for the overhauled RequestCacheStats, an integration test, and manual testing with the API.

Related Issues

Part of larger tiered caching feature.

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff
[N/A] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Fixing initialization issue that broke IT tests

…ing the domain to run

sgup432 · 2023-11-03T16:21:59Z

server/src/main/java/org/opensearch/index/cache/request/RequestCacheStats.java

-    private long evictions;
-    private long hitCount;
-    private long missCount;
+    private Map<String, StatsHolder> map;


Maybe initialize this map inline here. This way you don't need to worry about this not being intialized.
Like

map = new HashMap<>(){{ for(TierType tierType: TierType.values()) { put(tierType.getStringValue(), new StatsHolder()); } }};

sgup432 · 2023-11-03T16:27:43Z

server/src/main/java/org/opensearch/index/cache/request/RequestCacheStats.java

-        evictions = in.readVLong();
-        hitCount = in.readVLong();
-        missCount = in.readVLong();
+        this();


We are calling to initialize this map but looks error prone and wrong logically. As logically all variables inside this constructor should be initialized with StreamInput values. You can initialize the map inline as suggested above and we don't need this.

sgup432 · 2023-11-03T16:28:47Z

server/src/main/java/org/opensearch/index/cache/request/RequestCacheStats.java

-        this.missCount = missCount;
+    public RequestCacheStats(Map<TierType, StatsHolder> inputMap) {
+        // Create a RequestCacheStats with multiple tiers' statistics
+        this();


Again remove this.

sgup432 · 2023-11-03T16:36:49Z

server/src/main/java/org/opensearch/index/cache/request/StatsHolder.java

+import java.io.IOException;
+import java.io.Serializable;
+
+public class StatsHolder implements Serializable, Writeable, ToXContentFragment {


Considering this is specific to RequestCacheStats, better to move this inside RequestCacheStats itself.

This is also used in ShardRequestCache - I moved it out from this class because I wanted it to be used by RequestCacheStats as well

But would this StatsHolder be used elsewhere? As this looks pretty generic though will only be used for requestCache. Considering this is specific to RequestCache, maybe we can still keep it inside ShardRequestCache as public. And use it from RequestCacheStats, should be fine.

sgup432 · 2023-11-03T16:39:45Z

server/src/internalClusterTest/java/org/opensearch/indices/IndicesRequestCacheDiskTierIT.java

+// on a node with a maximum request cache size that we set.
+
+@OpenSearchIntegTestCase.ClusterScope(scope = OpenSearchIntegTestCase.Scope.TEST, numDataNodes = 0)
+public class IndicesRequestCacheDiskTierIT extends OpenSearchIntegTestCase {


We should ideally add these tests as part of IndicesRequestCacheIT. Lets check what is needed to do that.

sgup432 · 2023-11-03T16:40:46Z

server/src/internalClusterTest/java/org/opensearch/indices/IndicesRequestCacheIT.java

+
+        Settings.Builder builder = Settings.builder()
+            .put(IndicesRequestCache.INDEX_CACHE_REQUEST_ENABLED_SETTING.getKey(), true)
+            .put(IndexMetadata.SETTING_NUMBER_OF_SHARDS, 1)
+            .put(IndexMetadata.SETTING_NUMBER_OF_REPLICAS, 0);
+
+        assertAcked(client.admin().indices().prepareCreate("index").setMapping("k", "type=keyword").setSettings(builder).get());


I guess this change is not required? Remove?

sgup432 · 2023-11-03T16:44:30Z

server/src/internalClusterTest/java/org/opensearch/indices/IndicesRequestCacheDiskTierIT.java

+            assertSearchResponse(resp);
+            IndicesRequestCacheIT.assertCacheState(client, "index", 0, i + 1, TierType.ON_HEAP, false);
+            IndicesRequestCacheIT.assertCacheState(client, "index", 0, i + 1, TierType.DISK, false);
+            System.out.println("request number " + i);


sgup432 · 2023-11-03T16:44:37Z

server/src/internalClusterTest/java/org/opensearch/indices/IndicesRequestCacheDiskTierIT.java

+            System.out.println("request number " + i);
+        }
+
+        System.out.println("Num requests = " + numRequests);


sgup432 · 2023-11-03T16:51:48Z

server/src/internalClusterTest/java/org/opensearch/indices/IndicesRequestCacheDiskTierIT.java

+public class IndicesRequestCacheDiskTierIT extends OpenSearchIntegTestCase {
+    public void testDiskTierStats() throws Exception {
+        int heapSizeBytes = 1800; // enough to fit 2 queries, as each is 687 B
+        int requestSize = 687; // each request is 687 B


How did we calculate this? I guess manually?
Possible to create a request and after which we can estimate the size? Doing this we can dynamically generate this value.

….java Signed-off-by: Peter Alfonsi <petealft@amazon.com>

sgup432 · 2023-11-07T00:07:03Z

server/src/main/java/org/opensearch/indices/TieredCacheSpilloverStrategyHandler.java

+                double getTimeEWMA = getTimeEWMAIfDisk(cachingTier);
                if (value != null) {
-                    tieredCacheEventListener.onHit(key, value, cachingTier.getTierType());
+                    tieredCacheEventListener.onHit(key, value, cachingTier.getTierType(), getTimeEWMA);
                    return new CacheValue<>(value, cachingTier.getTierType());
                }
-                tieredCacheEventListener.onMiss(key, cachingTier.getTierType());
+                tieredCacheEventListener.onMiss(key, cachingTier.getTierType(), getTimeEWMA);


This doesn't seem right. We should ideally put these get times inside Disk caching tier itself. So that if we have a different implementation of TieredService, we don't have to duplicate this work.

And as discussed having separate DiskCacheStats separately should be able to solve this.

sgup432 · 2023-11-07T00:20:38Z

server/src/main/java/org/opensearch/indices/TieredCacheEventListener.java

@@ -12,11 +12,11 @@

 public interface TieredCacheEventListener<K, V> {

-    void onMiss(K key, TierType tierType);
+    void onMiss(K key, TierType tierType, double getTimeEWMA);


Adding getTimeEWMA here isn't needed considering it will only be needed as part of stats. We need to rethink in terms of low level design. For such specific stats related to a particular tier, we can instead create separate DiskTierStats associated with disk tier for example, keep accumulating relevant stats there in memory. This stats can be eventually used inside ShardRequestCacheStats to pull in those values.

As there might be more values coming in later on which we need to add as part of stats, so this solution isn't extensible as we can't keep adding it here.

This makes sense. I just wasn't sure where to actually fetch the values from the disk tier, and I thought onHit and onMiss would be reasonable since that's when getTimeEWMA will actually change. But I agree it's not extensible to new stats which might change on some other frequency. Should we instead have some sort of background job to periodically gather stats from the disk tier?

sgup432 · 2023-11-07T00:34:45Z

server/src/main/java/org/opensearch/indices/EhcacheDiskCachingTier.java

@@ -52,7 +56,7 @@ public class EhcacheDiskCachingTier implements DiskCachingTier<IndicesRequestCac
    private final String diskCacheFP; // the one to use for this node
    private RemovalListener<IndicesRequestCache.Key, BytesReference> removalListener;
    private ExponentiallyWeightedMovingAverage getTimeMillisEWMA;
-    private static final double GET_TIME_EWMA_ALPHA  = 0.3; // This is the value used elsewhere in OpenSearch
+    private static final double GET_TIME_EWMA_ALPHA = 0.3; // This is the value used elsewhere in OpenSearch


ExponentiallyWeightedMovingAverage(GET_TIME_EWMA_ALPHA, 10). Keeping 10 as intialAvg doesn't seem right. Don't know the right value but should keeping it 0 be better?

I somewhat arbitrarily picked 10 since we expect a disk seek to take ~10 ms on a spinning disk but 0 might be better, yeah

sgup432 · 2023-11-07T00:36:12Z

server/src/main/java/org/opensearch/indices/EhcacheDiskCachingTier.java

@@ -52,7 +56,7 @@ public class EhcacheDiskCachingTier implements DiskCachingTier<IndicesRequestCac
    private final String diskCacheFP; // the one to use for this node
    private RemovalListener<IndicesRequestCache.Key, BytesReference> removalListener;
    private ExponentiallyWeightedMovingAverage getTimeMillisEWMA;


Lets also have normal average as well. EWMA might be useful for recent stats and normal will give overall view of get time. As discussed, creating some separate DiskStats might be better. We can move such stats there.

sgup432 · 2023-11-07T00:38:40Z

server/src/main/java/org/opensearch/index/cache/request/StatsHolder.java

+import java.io.IOException;
+import java.io.Serializable;
+
+public class StatsHolder implements Serializable, Writeable, ToXContentFragment {


But would this StatsHolder be used elsewhere? As this looks pretty generic though will only be used for requestCache. Considering this is specific to RequestCache, maybe we can still keep it inside ShardRequestCache as public. And use it from RequestCacheStats, should be fine.

Peter Alfonsi added 8 commits October 18, 2023 14:37

added test entries metric for on heap

c760594

Merge branch 'framework-serialized' into update_node_stats

0b2e506

Fixing initialization issue that broke IT tests

Added check for new entries field in IT test, added permissions allow…

4680ea7

…ing the domain to run

Initial implementation for tiered node request cache stats

c300d8b

Added version checks to streaming functions

9d8d433

Added UT for RequestCacheStats

7d03562

Added IT for disk tier stats, plus spotlessApply

c5099f1

cleaned up

64b4bb3

peteralfonsi marked this pull request as draft October 23, 2023 22:39

sgup432 reviewed Nov 3, 2023

View reviewed changes

Peter Alfonsi added 2 commits November 6, 2023 09:44

Addressed comments besides moving IT tests into IndicesRequestCacheIT…

157ca6e

….java Signed-off-by: Peter Alfonsi <petealft@amazon.com>

Added/tested get time EWMA to non-heap tiers

81c038a

peteralfonsi mentioned this pull request Nov 6, 2023

Added/tested get time EWMA to non-heap tiers for node stats response #10

Closed

sgup432 reviewed Nov 7, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tiered stats to request cache response #8

Add tiered stats to request cache response #8

peteralfonsi commented Oct 23, 2023

sgup432 Nov 3, 2023

sgup432 Nov 3, 2023

sgup432 Nov 3, 2023

sgup432 Nov 3, 2023

peteralfonsi Nov 6, 2023

sgup432 Nov 7, 2023

sgup432 Nov 3, 2023

sgup432 Nov 3, 2023

sgup432 Nov 3, 2023

sgup432 Nov 3, 2023

sgup432 Nov 3, 2023

sgup432 Nov 7, 2023

sgup432 Nov 7, 2023

sgup432 Nov 7, 2023

peteralfonsi Nov 7, 2023

sgup432 Nov 7, 2023

peteralfonsi Nov 7, 2023

sgup432 Nov 7, 2023

sgup432 Nov 7, 2023

Add tiered stats to request cache response #8

Are you sure you want to change the base?

Add tiered stats to request cache response #8

Conversation

peteralfonsi commented Oct 23, 2023

Description

Related Issues

Check List

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment