-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HA handling for store nodes #199
Comments
I think we might need that sooner than later... (: How can we do it easily? Basically we need to tell |
The most basic way would just be the option to add for example --bucketid="xxx" to the storage command. |
For active/passive this could be done using a leader latch protocol and sharing the data downloaded by the leader as it could announce any new downloaded bucket via gossip (for a faster failover) and share it via HTTP/gRPC. This would eliminate the need to fetch the data from an object store directly and allow for the query nodes to have only a single source of truth (the current leader) |
I'd like to volunteer to take this on. For our use case, downtime caused by the store instance fronting an S3 bucket being rescheduled to another machine is not really palatable. I'm thinking of an active-active solution, since it avoids some of the complexities around deciding which instance is 'active' and would be more efficient with resources. As store nodes are essentially just caches, I think it should reasonable straightforward to achieve. While thinking about high availability, we should also consider allowing the store nodes to scale horizontally for very large deployments, effectively allowing horizontal scaling the LRU cache of indices. I propose:
Just an idea: If we have multiple shards, we might simplify the store instances by avoiding persisting the cache to disk, since the amount of data to pull from object storage would be reduced by |
@mattbostock Thanks! It all works for one assumption: Thanos setup has only bucket to take data from, are we ok with it? I have seen some use cases for multiple buckets connected to same Thanos "cluster/network/setup", because "it is easier to manage", "my object storage is specific" etc. Maybe that's separate issue, but woth to be aware of this while implementing HA.
Makes sense, just I would love to hear/see more about the implementation details. As you suggested offline: https://godoc.org/github.com/golang/groupcache sound nice but it means that we are talking about sharding fully on stores (you ask whatever store and it gives you correct answer 100% time even if it needs to ask its peers) or maybe we want thanos-query to be aware of store sharding? Also are we are talking about sharding index cache based on... what? On matchers 0.o?
Totally agree and thanks for example 👍 However, I would start from something simple first - just replicating (so true HA), because that is what you need (from you what you say). This will enable horizontal scaling (will offload single store) and potentially improve performance as well. Just sharding will ONLY improve the availability (but will still have some major disruption time), regarding the performance it is hard to say without #346 (which is in progress). |
Added a proposal for high-availability for store instances here: #404 |
This can be solved by just by running multiple of Store Gateways behind any Loadbalancer (like Kuberentes Service) and without gossip. |
…hanos-io#199) * Replace summary in extprom metrics with histogram (thanos-io#6327) * Replaced summary in extprom metrics with histogram Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> * Added changelog Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> * Removed unused parameters from NewInstrumentationMiddleware Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> * Reverted NewInstrumentationMiddleware Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> --------- Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> * Avoid expensive log.Valuer evaluation for disallowed levels (thanos-io#6322) Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> * Fix inconsistent error for series limits in Store API (thanos-io#6330) * store: fix inconsistent error for series limits Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * Update pkg/store/bucket.go Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com> Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * Update pkg/store/bucket.go Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com> Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * rename labelValues serires liimiter test function Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com> * *: remove unmaintained gzip library (thanos-io#6332) Switch from nytimes gzip library to the klaustpost's gzip code. The old gzip HTTP handler shows up a lot in allocs so that's how I ended up doing this change. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Traces sampler env var (thanos-io#6306) * Issue#5947 OTEL_TRACES_SAMPLER env var Signed-off-by: shayyxi <shazi12384@gmail.com> * Test correction Signed-off-by: shayyxi <shazi12384@gmail.com> * doc failure correction. parse float argument correction. Signed-off-by: shayyxi <shazi12384@gmail.com> * added the changelog. Signed-off-by: shayyxi <shazi12384@gmail.com> * ran make docs to fix the build failure. Signed-off-by: shayyxi <shazi12384@gmail.com> * corrected the incorrect change in tools.md Signed-off-by: shayyxi <shazi12384@gmail.com> * fixed review comments. Signed-off-by: shayyxi <shazi12384@gmail.com> --------- Signed-off-by: shayyxi <shazi12384@gmail.com> Signed-off-by: Shazi <42436533+shayyxi@users.noreply.github.com> Co-authored-by: shayyxi <shazi12384@gmail.com> * query: use storepb.SeriesServer (thanos-io#6334) Use storepb.SeriesServer instead of the concrete struct. This allows implementing functionality on top of the proxy. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * cacheutil: upgrade `rueidis` to v1.0.2 to improve error handling while shrinking a redis cluster. redis/rueidis#209 (thanos-io#6342) * use github.com/onsi/gomega/gleak to detect goroutine leak with timeout Signed-off-by: Rueian <rueiancsie@gmail.com> * fix: spelling errors DoInSpanWtihErr to DoInSpanWithErr (thanos-io#6345) Signed-off-by: aimuz <mr.imuz@gmail.com> * Return grpc code resource exhausted for byte limit error (thanos-io#6325) * return grpc code resource exhausted for byte limit error Signed-off-by: Ben Ye <benye@amazon.com> * fix lint Signed-off-by: Ben Ye <benye@amazon.com> * update partial response strategy Signed-off-by: Ben Ye <benye@amazon.com> * fix limit Signed-off-by: Ben Ye <benye@amazon.com> * try to fix tests Signed-off-by: Ben Ye <benye@amazon.com> * fix test error message Signed-off-by: Ben Ye <benye@amazon.com> * fix test Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com> * Expose info for each TSDB This commit exposes the label set alongside the min and max time for each TSDB covered by a Store. This information is used to scope the min time for a remote query so that we do not produce partial aggregates in distriuted mode. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Add test case for proxy store Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Bump promql-engine to fix thanos-io/promql-engine#239 (thanos-io#6349) Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com> * Updates busybox SHA (thanos-io#6365) Signed-off-by: GitHub <noreply@github.com> Co-authored-by: fpetkovski <fpetkovski@users.noreply.github.com> * Query: Add +Inf bucket to query duration metrics (thanos-io#6358) * Query: Add +Inf bucket to query duration metrics For the query duration metrics (`thanos_store_api_query_duration_seconds`), we record query respond latency, based on the size of the query (samples/series), and save to a histogram. However, when a query is made which exceeds the biggest sample/serie size, we would prior to this commit, put the request into the largest bucket. With this commit, we instead create an `+Inf` bucket, and put requests which are larger than the biggest defined bucket into that. This gives more accurate results, and also allow one to see if the bucket sizes are incorrectly sized. Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com> * Tests: Mutex around non-thread safe random source When creating test blocks, we use a non-thread safe random source, in multiple goroutines. Due to this, tests would sometime panic. This commits puts a mutex around calls using the same source, in order to avoid this. This should hopefully improve reliability of e2e tests. Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com> --------- Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com> * e2e(query): Reproduce dedup issue from thanos-io#6257 Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add dedup e2e test for Receive With internal and external labels support. Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Simplify generated blocks for query test Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Improve query dedup test Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Write a query test for dedup with sidecar Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Refactor query dedup test with sidecar Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix Receive query test Now it properly ensures the double dedup works (on internal and external labels). Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix receive drawing Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add one extra test caes for query dedup from store Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Complement test for Receive query with dedup Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Complement test for Sidecar query dedup Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Expected failure of block label query dedup tests Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Rerun CI Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Rerun CI Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Check context when expanding postings (thanos-io#6363) * check context when expanding postings Signed-off-by: Ben Ye <benye@amazon.com> * update changelog Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com> * ui: only keep name in store_matches param (thanos-io#6371) We are doing store matching on the `name` field hence only keep that field in the URL because otherwise the URL could get quite lengthy with external labelsets inside of it. Besides unit tests, I have also tested locally: - Enable store filtering; - Select store(-s); - Copy/paste URL into the new tab and see that the same stores are loaded like expected; - See that URL only has names in them. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * docs: replace --store with --endpoint Replace deprecated `--store` with `--endpoint` in docs. Signed-off-by: Paul Gier <paul.gier@datastax.com> * Optimizing "grafana generated" regex matchers (thanos-io#6376) * Opmizing Group Regex Signed-off-by: Alan Protasio <alanprot@gmail.com> * fixing native histogram tests Signed-off-by: Alan Protasio <alanprot@gmail.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com> * Cache: various index cache client improvements (thanos-io#6374) * Query Explanation (thanos-io#6346) * Return Query Explaination in QueryAPI A param `explain` is added to QueryAPI, if true then explanation returned by the `Explain()` method of the query having structure `ExplainOutputNode` is returned in response. Query Explanation is added under new field in response that is `thanosInfo`. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Add explain checkbox in thanos UI A explain checkbox is added to Thanos Query UI, that requests for query explanation from thanos query api. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Add ExpandableNode Component ExpandableNode component renders Query Explanation in the thanos UI. Requires a new package `react-accessible-treeview`. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Disable Explain checkbox on prometheus engine Prometheus engine sends out error if toggle explain button. To provide better experience, the explain checkbox get disbaled on switching to prometheus engine and enable back on switching to thanos engine. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Add alert box with horizontal scrolling for Explanation Signed-off-by: Pradyumna Krishna <git@onpy.in> * Remove ExpandableNode and Add ListTree Updates the design for query explanation box, removes `ExpandableNode` and the dependency. Builts a new `ListTree` that does the same using reactstrap and custom css. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Minor refactor in Query API response `thanosInfo` is removed from Query reponse and used `explanation` directly. `disableCheckbox` is also renamed to `disableExplainCheckbox` in thanos UI. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Update UI tests to passing Signed-off-by: Pradyumna Krishna <git@onpy.in> * Minor UI changes and test fix UI improvements and Panel test fix other way around, resetting the results on panel construction. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Update promql-engine to use Explain method Signed-off-by: Pradyumna Krishna <git@onpy.in> * Build UI assets Build UI assets, that runs new thanos UI with explain button. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Revert proxy url change from package.json `proxy` was accidently changed and committed with package.json when removed dependency. Hence, reverting it back. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Minor changes in UI Fix requested changes in UI. - Rename `state` and `setState` to `mapping` and `setMapping`. - Rename `NodeTree` to `QueryTree`. - Use unicode characters instead of `-` and `+`. - Fix blue box on explain button. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Update UI assets Signed-off-by: Pradyumna Krishna <git@onpy.in> --------- Signed-off-by: Pradyumna Krishna <git@onpy.in> * Implementing Regex optimization on the `MatchNotRegexp` and `MatchNotEqual` matcher type (thanos-io#6379) * Implementing Regex optimization on the MatchNotRegexp matcher type Signed-off-by: Alan Protasio <alanprot@gmail.com> * Opmizing MatchNotEqual Signed-off-by: Alan Protasio <alanprot@gmail.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com> * Put back the correct makefile Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Remove extra line that broke untouched test Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add back line break at end of makefile Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix Receive single ingestor test Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Reproduce dedup issue in Receive Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add even more test cases for dedup on store gw Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Reproduce dedup bug in Sidecar Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Reuse nginx image name Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Let all users read the metrics file from static metrics server Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Rerun CI Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Rerun CI Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Reformat asciiflow chart Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Reuse static metrics server from e2e framework Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * add de-cix as adopter (thanos-io#6386) Signed-off-by: Raul Garcia Sanchez <info@raulgarcia.de> * [chore] Updating Query Engine and Prometheus (thanos-io#6392) * Updating Query Engine Signed-off-by: Alan Protasio <alanprot@gmail.com> * fix prometheus breaking change Signed-off-by: Alan Protasio <alanprot@gmail.com> * Update prometheus with prometheus/prometheus#12387 Signed-off-by: Alan Protasio <alanprot@gmail.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com> * Receive: Allow specifying tenant-specific external labels in RouterIngestor (thanos-io#5777) Signed-off-by: haanhvu <haanh6594@gmail.com> * check context cancel when doing posting batches (thanos-io#6396) Signed-off-by: Ben Ye <benye@amazon.com> * Expose store gateway query stats in series response hints (thanos-io#6352) * expose query stats hints Signed-off-by: Ben Ye <benye@amazon.com> * update Signed-off-by: Ben Ye <benye@amazon.com> * add query stats hints in result Signed-off-by: Ben Ye <benye@amazon.com> * update changelog Signed-off-by: Ben Ye <benye@amazon.com> * add merge method Signed-off-by: Ben Ye <benye@amazon.com> * fix unit test Signed-off-by: Ben Ye <benye@amazon.com> modify hints proto Signed-off-by: Ben Ye <benye@amazon.com> fix unit test Signed-off-by: Ben Ye <benye@amazon.com> update format Signed-off-by: Ben Ye <benye@amazon.com> * update comments Signed-off-by: Ben Ye <benye@amazon.com> * try again Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com> * receive: make az aware ketama hashring (thanos-io#6369) * receive: make az aware ketama hashring Signed-off-by: Alexander Rickardsson <alxric@aiven.io> * receive: pass endpoints in hashring config as object Signed-off-by: Michael Hoffmann <michael.hoffmann@aiven.io> * receive: add some tests for consistent hashing in presence of AZs Signed-off-by: Michael Hoffmann <michael.hoffmann@aiven.io> * receive,docs: add migration note for az aware hashring Signed-off-by: Michael Hoffmann <michael.hoffmann@aiven.io> --------- Signed-off-by: Alexander Rickardsson <alxric@aiven.io> Signed-off-by: Michael Hoffmann <michael.hoffmann@aiven.io> Co-authored-by: Michael Hoffmann <michael.hoffmann@aiven.io> * Proposal: query path tenancy (thanos-io#6320) * Add 1st version of query path tenancy proposal Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update proposal after initial feedback Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add cool picture Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Include example in cross tenant query complications Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Improve reasoning for why not using the QFE Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Improve writing in "How" section Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix owner profile link Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com> Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Address few more PR review comments Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Address feedback on flag name text Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update diagram Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Improve non-goals text Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update diagram Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update docs/proposals-accepted/202304-query-path-tenancy.md Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com> Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Clarify scenario for pitfalls of current solution Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Clarify that Store doesn't care about tenant label Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add an action plan Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Mention alternative idea of modifying Store API Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix typo Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com> Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Address lots of feedback on the proposal Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Format query path tenancy proposal doc Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add a "Tenancy Model" subsection to "Goals" Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Mention header semanthics in comparison with gRPC message field Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Improve action plan structure and writing Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> --------- Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com> Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com> * Fix double-counting bug in http_request_duration metric (thanos-io#6399) * fix double-counting bug in http_request_duration metric Signed-off-by: 4orty <kwk5178@gmail.com> * Update Changelog Signed-off-by: 4orty <kwk5178@gmail.com> --------- Signed-off-by: 4orty <kwk5178@gmail.com> * Updates busybox SHA (thanos-io#6403) Signed-off-by: GitHub <noreply@github.com> Co-authored-by: fpetkovski <fpetkovski@users.noreply.github.com> * Fix series stats merge (thanos-io#6408) * fix series stats merge Signed-off-by: Ben Ye <benye@amazon.com> * update license header Signed-off-by: Ben Ye <benye@amazon.com> * use reflect Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com> * Receive: allow unlimited head_series_limit tenants (thanos-io#6406) With this commit we now allow to configure tenants with unlimited active series limit by setting the limit to `0`. Prior to this commit setting a per tenant limit to `0` would cause the tenant to be unable to write any metrics at all. This fixes: thanos-io#6393 Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com> * expose downloaded data size in query hints (thanos-io#6409) Signed-off-by: Ben Ye <benye@amazon.com> * maintainers: add myself to triagers (thanos-io#6414) Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Add `@douglascamata` to triagers (thanos-io#6418) Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add Blog (thanos-io#6411) * Add LFX blog Signed-off-by: Pradyumna Krishna <git@onpy.in> * Add Headers to blog Signed-off-by: Pradyumna Krishna <git@onpy.in> * Lint blog Signed-off-by: Pradyumna Krishna <git@onpy.in> --------- Signed-off-by: Pradyumna Krishna <git@onpy.in> * blog: Fix images for LFX post (thanos-io#6422) * blog: Fix images for LFX post Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * fix lint Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Index Cache: Change cache key for postings (thanos-io#6405) * extend postings cache key with codec Signed-off-by: Ben Ye <benye@amazon.com> * add changelog Signed-off-by: Ben Ye <benye@amazon.com> * update code back Signed-off-by: Ben Ye <benye@amazon.com> * add colon Signed-off-by: Ben Ye <benye@amazon.com> * update changelog Signed-off-by: Ben Ye <benye@amazon.com> * fix another test Signed-off-by: Ben Ye <benye@amazon.com> * add compression scheme const to remote index cache Signed-off-by: Ben Ye <benye@amazon.com> * address required comments Signed-off-by: Ben Ye <benye@amazon.com> * fix compression scheme name Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com> * Receive: upgrading logs for failed uploads to error (thanos-io#6427) * FIX: upgrading log for failed upload to error Signed-off-by: Victor Fernandes <victorhbfernandes@gmail.com> * docs: added changelog entry Signed-off-by: Victor Fernandes <victorhbfernandes@gmail.com> --------- Signed-off-by: Victor Fernandes <victorhbfernandes@gmail.com> * fix postings test Signed-off-by: Ben Ye <benye@amazon.com> * Add aiven as adopter... more soon! (thanos-io#6430) Signed-off-by: Jonah Kowall <jkowall@kowall.net> * Report gRPC connnection errors to the caller (thanos-io#6428) By default `grpc.DialContext()` is non-blocking so any connection issue will not be surfaced to the user. This change makes it blocking and configures the gRPC dialer to report the underlying error if any happens. Signed-off-by: Simon Pasquier <spasquie@redhat.com> * chore: remove duplicated `gopkg.in/fsnotify.v1` dep (thanos-io#6432) * chore: remove duplicated `gopkg.in/fsnotify.v1` dep `github.com/fsnotify/fsnotify` and `gopkg.in/fsnotify.v1` are the same dependency. We can keep `github.com/fsnotify/fsnotify` and remove `gopkg.in/fsnotify.v1`. Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * docs: add changelog Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> --------- Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * Expose estimated chunk and series size as configurable options (thanos-io#6426) * expose estimated chunk and series size as configurable options Signed-off-by: Ben Ye <benye@amazon.com> * fix lint Signed-off-by: Ben Ye <benye@amazon.com> * fix test Signed-off-by: Ben Ye <benye@amazon.com> * fix test Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com> * Receive: make tsdb stats limit configurable (thanos-io#6437) * Receive: make tsdb stats limit configurable Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Receive: make tsdb stats limit configurable Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> --------- Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * *: wire new Engine/Explain fields in query-frontend (thanos-io#6433) - Pass Engine/Explain fields in query-frontend codecs - Add Engine field to QFE cache key - Add e2e tests for all cases Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * index cache: Cache expanded postings (thanos-io#6420) * cache expanded postings in index cache Signed-off-by: Ben Ye <benye@amazon.com> * update changelog Signed-off-by: Ben Ye <benye@amazon.com> * fix Signed-off-by: Ben Ye <benye@amazon.com> * fix lint Signed-off-by: Ben Ye <benye@amazon.com> * rebase main and added compression name to key Signed-off-by: Ben Ye <benye@amazon.com> * update key Signed-off-by: Ben Ye <benye@amazon.com> * add e2e test for memcached Signed-off-by: Ben Ye <benye@amazon.com> * fix cache config Signed-off-by: Ben Ye <benye@amazon.com> * address review comments Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com> * add approximate series size to index stats (thanos-io#6425) Signed-off-by: Ben Ye <benye@amazon.com> * index stats: fix chunk size calculation (thanos-io#6424) Signed-off-by: Ben Ye <benye@amazon.com> * Remove some unused Cortex vendored code and metrics (thanos-io#6440) * Fixed DefaultPromConfig * Fixed imports * Back to diffVarintSnappyEncode * Merge pull request thanos-io#180 from Shopify/optimize-timerange-calculation Cache calculated mint and maxt for each remote engine * Updated busybox * fixing lint * Fixing merge conflict Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com> * Fixing missing import Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com> * fix lint again Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com> * resolving conflict merges Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com> * Fixed import and fn order * Fixed unit tests * Updated promdoc.sum * Back to custom promql engine * Removed custom promql engine and moved to latest upstream * Ran go mod tidy * Fixed GetQueryAPIClients * Store: fix crash on empty regex matcher Signed-off-by: Michael Hoffmann <michael.hoffmann@aiven.io> --------- Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Signed-off-by: shayyxi <shazi12384@gmail.com> Signed-off-by: Shazi <42436533+shayyxi@users.noreply.github.com> Signed-off-by: Rueian <rueiancsie@gmail.com> Signed-off-by: aimuz <mr.imuz@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com> Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com> Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Signed-off-by: Paul Gier <paul.gier@datastax.com> Signed-off-by: Alan Protasio <alanprot@gmail.com> Signed-off-by: Pradyumna Krishna <git@onpy.in> Signed-off-by: Raul Garcia Sanchez <info@raulgarcia.de> Signed-off-by: haanhvu <haanh6594@gmail.com> Signed-off-by: Alexander Rickardsson <alxric@aiven.io> Signed-off-by: Michael Hoffmann <michael.hoffmann@aiven.io> Signed-off-by: 4orty <kwk5178@gmail.com> Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> Signed-off-by: Victor Fernandes <victorhbfernandes@gmail.com> Signed-off-by: Jonah Kowall <jkowall@kowall.net> Signed-off-by: Simon Pasquier <spasquie@redhat.com> Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com> Co-authored-by: Sebastian Rabenhorst <4246554+rabenhorst@users.noreply.github.com> Co-authored-by: Xiaochao Dong <the.xcdong@gmail.com> Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Co-authored-by: Shazi <42436533+shayyxi@users.noreply.github.com> Co-authored-by: shayyxi <shazi12384@gmail.com> Co-authored-by: Rueian <rueiancsie@gmail.com> Co-authored-by: aimuz <mr.imuz@gmail.com> Co-authored-by: Ben Ye <benye@amazon.com> Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com> Co-authored-by: Alban Hurtaud <alban.hurtaud@amadeus.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: fpetkovski <fpetkovski@users.noreply.github.com> Co-authored-by: Jacob Baungård Hansen <jacobbaungard@redhat.com> Co-authored-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Co-authored-by: Paul Gier <paul.gier@datastax.com> Co-authored-by: Alan Protasio <alanprot@gmail.com> Co-authored-by: Pradyumna Krishna <git@onpy.in> Co-authored-by: Raúl Garcia Sanchez <info@raulgarcia.de> Co-authored-by: Ha Anh Vu <75315486+haanhvu@users.noreply.github.com> Co-authored-by: Alexander Rickardsson <alxric@aiven.io> Co-authored-by: Michael Hoffmann <michael.hoffmann@aiven.io> Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com> Co-authored-by: Wonki Kim <kwk5178@gmail.com> Co-authored-by: Michael Hoffmann <mhoffm@posteo.de> Co-authored-by: Victor Hugo Brito Fernandes <victorhbfernandes@gmail.com> Co-authored-by: Jonah Kowall <jkowall@kowall.net> Co-authored-by: Simon Pasquier <spasquie@redhat.com> Co-authored-by: Eng Zer Jun <engzerjun@gmail.com> Co-authored-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>
…proxy (thanos-io#199)" This reverts commit a93191e.
Store nodes are currently generally run as a single replica. It's not super critical to have HA in general since several hours or even days of recent data are HA via the Prometheus servers. But for some scenarios it might still be preferable.
Two could simply be deployed and the query node would take care of deduplication/merging just like for Prometheus HA pairs. But unlike Prometheus servers, the underlying data is truly the same in this case and fetching twice the amount is unnecessary overhead.
Some simple logic could be added to the query node to recognize real duplicates (Prometheus HA pairs are actually different through a
replica
label) and to only query one of them.The text was updated successfully, but these errors were encountered: