Skip to content

Commit

Permalink
Enhance unified endpoint discovery proposal (#4070)
Browse files Browse the repository at this point in the history
* Modify unified endpoint discovery proposal

Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com>

* Run make docs

Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com>

* Perform suggested changes

Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com>

* Remove compact and downsample and fix some minor nits

Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com>

* Be explicit about the use of gRPC reflection

Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com>

* Apply suggested changes

Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com>
  • Loading branch information
hitanshu-mehta committed Apr 30, 2021
1 parent 116c4bd commit 45ac573
Showing 1 changed file with 20 additions and 15 deletions.
35 changes: 20 additions & 15 deletions docs/proposals/202101_endpoint_discovery.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,34 @@
title: Unified Endpoint Discovery
type: proposal
menu: proposals
status: XXX
status: accepted
owner: lilic
---

### Related Tickets

* https://github.com/thanos-io/thanos/issues/2600
* https://github.com/thanos-io/thanos/issues/3538

### Summary

We want to propose a new flag called `--endpoint=<address>` that will be passed to Thanos query and replace the current `--store=<address>` and `--rule=<address>` flags, and add a discovery mechanism which will be used to determine what types of services this endpoint can serve.

### Motivation

Currently, in Thanos Query, the discovery of rules APIs happens via Store API's Info method. This makes it harder if we ever want to not have a coupling to the Store API (which is already planned for [scalable ruler proposal](https://github.com/thanos-io/thanos/blob/master/docs/proposals/202005_scalable-rule-storage.md)).
Currently, in Thanos Query, the discovery of rules APIs happens via Store API's Info method. This makes it harder if we ever want to not have a coupling to the Store API (which is already planned for [scalable ruler proposal](https://github.com/thanos-io/thanos/blob/main/docs/proposals/202005_scalable-rule-storage.md)).

We also require passing two different flags to the Thanos Query component `--store=<address>` and `--rule=<address>`. If users use both flags, the Query component performs DNS discovery on often the same address multiple times, this can cause occasional DNS issues as this resolving happens so frequently. This is especially confusing when one works and the other doesn't.

Adding new APIs in the future, like
We also require passing two different flags to the Thanos Query component `--store=<address>` and `--rule=<address>`. If users use both flags, the Query component performs DNS discovery on often the same address multiple times, which can cause occasional DNS issues as this resolution happens so frequently. This is especially confusing when one DNS lookup works and the other doesn't. Adding new APIs in the future would exacerbate this issue with DNS requests.

### Goals

Unite discovery of endpoints, which would make code cleaner as well as provide building blocks to provide discover and support for new gRPC services, for example, the Targets API/Exemplar API/Metadata API.
Unite discovery of endpoints, which would make code cleaner as well as provide building blocks to provide discovery and support for new gRPC services, for example, the Targets API/Exemplar API/Metadata API.

### Solution

Add a new flag called `--endpoint` to Thanos query, and auto-discover what services that endpoint is serving based on metadata each gRPC server exposes.

Each component will expose an Info API, that includes various metadata listed below. Discovery of endpoints will happen via this Info API, there might be a case that the discovery of gRPC will also have to happen via [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md), this will be shown when implemention starts.
Each component will expose an Info service, that includes various metadata listed below. Discovery of what services an endpoint is exposing will happen via [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md). Once we discover what services that endpoint is exposing (e.g. StoreAPI, TargetsAPI) we can retrieve metadata about them using a single `Info` method call from the Info service.

Info API metadata would include the following fields regardless of the type:
```
Expand All @@ -42,20 +42,25 @@ info:
blah: ...
```

Right now this info API would be added to existing components Ruler, Store, Sidecar, Compact, Downsample, Receive. But this would be added in the future for Targets, Exemplers, etc.
Right now this Info service would be added to existing components exposing a gRPC server.

The upgrade path would be possible in this case, as it's strictly additive for endpoints, and there will be a migration path to fall back to current discovery methods, which could be removed in the future.

### Alternatives

The upgrade path would be possible in this case, as it's strictly additive for endpoints, and there will be a migration path to fallback to current discovery methods which could be removed in the future.
Initial discovery of gRPC services available will happen via gRPC reflection. To get further information, we would be adding an Info method to other services for example, Targets, Exemplars, etc. This would contain similar info or metadata information as we get right now from the Store Info method. We can always add more data to it in the future as needed.

#### Alternatives
This alternative solution was also considered, however it was not chosen because of its following drawbacks.

Split Info method out of the Store API into a separate gRPC service called `info`. Each component must implement and expose this. The discovery would happen via this info gRPC service, which each component would need to explicitly state which API it supports. Besides that information, we would also need to include here various info, e.g. LabelSets, Max/Min time, etc. The drawback for this solution is harder upgrades as this would be more complex to implement as we would need to fall back to older APIs.
1. For servers implementing multiple APIs (e.g. Store, Rules, Exemplar), we need to make multiple gRPC calls, e.g. three calls to a component implementing Store, Rules and Exemplars APIs.
2. There is also ambiguity about which Info method should be used for these kinds of servers.

### Upgrade plan

Regardless of the solution we will be not removing any flags for a given period of time and have fallback code in place to be able to discover in the same way as we do right now. This will ensure smooth upgrade path for the users that make use of the current flags for discovery. We will mark the current flags as depracted and after two releases have passed we can remove the existing store and rule flag from query component and remove any migration code.
Regardless of the solution, we will be not removing any flags for a given period and have a fallback in place to be able to discover in the same way as we do right now. This will ensure a smooth upgrade path for the users who make use of the current flags for discovery. We will mark the current flags as deprecated and after two releases have passed we can remove the existing component-specific discovery flags from the query component and remove any migration code.

### Work plan

1. We would be adding the new info APIs and flags needed for the above solution, so this part would be entirely additive.
2. Second step would be the integration of this APIs and flags and migration of existing code.
3. We would not be removing the existing flags (`--store` and `--rule`) for some grace period of time, after that passes we would also have to remove it and any migration code.
1. We would be adding the new Info service and flags needed for the above solution, so this part would be entirely additive.
2. Second step would be the integration of this service, the addition of new flags, and migration of existing code.
3. We would not be removing the existing flags (`--store` and `--rule`) for some grace period, after that passes we would also have to remove it and any migration code.

0 comments on commit 45ac573

Please sign in to comment.