Skip to content

Commit

Permalink
Always validate tenant IDs and introduce max tenants setting (#6959)
Browse files Browse the repository at this point in the history
This change updates dskit to a version that does _not_ rely on global
state for tenant ID parsing. Specifically it pulls in grafana/dskit#445.
As part of this there are a few things changing:

* We use multi-tenant parsing logic everywhere which actually enforces
  limits on the length of tenant IDs and the legal characters in them.
* Instead of relying on single tenant parsing logic when tenant federation
  is disabled to reject multi-tenant queries, we add a query middleware that
  validates the number of expected tenants based on configuration.
* We introduce a new setting to limit the max number of tenant IDs that may be
  included in a multi-tenant query.

This change will result in different behavior in a few cases. However, it
brings the actual behavior of Mimir in line with the documented behavior.
Specifically, the following behavior changes (copied from dskit PR):

* SingleResolver did not previously enforce a limit on the length of a tenant
  ID. A limit of 150 characters is now enforced. This has always been the
  documented behavior as far back as Cortex, where this code originated. Not
  enforcing it was an oversight.
* SingleResolver previously allowed tenant IDs to contain the | character.
  This is no longer allowed as part of a tenant ID and instead will be treated
  as a divider between multiple tenant IDs. This has always been the documented
  behavior as far back as Cortex, where this code originated. Not enforcing it
  was an oversight.

See grafana/dskit#445
  • Loading branch information
56quarters committed Jan 2, 2024
1 parent dbf41d6 commit 41fd54e
Show file tree
Hide file tree
Showing 27 changed files with 322 additions and 185 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
### Grafana Mimir

* [CHANGE] Ingester: Increase default value of `-blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes` and `-blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes` to 100 MiB (previous default value was 10 MiB). #6764
* [CHANGE] Validate tenant IDs according to [documented behavior](https://grafana.com/docs/mimir/latest/configure/about-tenant-ids/) even when tenant federation is not enabled. Note that this will cause some previously accepted tenant IDs to be rejected such as those longer than 150 bytes or containing `|` characters. #6959
* [ENHANCEMENT] Store-gateway: add no-compact details column on store-gateway tenants admin UI. #6848
* [ENHANCEMENT] PromQL: ignore small errors for bucketQuantile #6766
* [ENHANCEMENT] Distributor: improve efficiency of some errors #6785
Expand All @@ -13,6 +14,7 @@
* [ENHANCEMENT] Store-gateway: include more information about lazy index-header loading in traces. #6922
* [ENHANCEMENT] Distributor: support disabling metric relabel rules per-tenant via the flag `-distributor.metric-relabeling-enabled` or associated YAML. #6970
* [ENHANCEMENT] Distributor: `-distributor.remote-timeout` is now accounted from the first ingester push request being sent. #6972
* [FEATURE] Introduce `-tenant-federation.max-tenants` option to limit the max number of tenants allowed for requests when federation is enabled. #6959
* [BUGFIX] Ingester: don't ignore errors encountered while iterating through chunks or samples in response to a query request. #6451
* [BUGFIX] Fix issue where queries can fail or omit OOO samples if OOO head compaction occurs between creating a querier and reading chunks #6766
* [BUGFIX] Fix issue where concatenatingChunkIterator can obscure errors #6766
Expand Down
11 changes: 11 additions & 0 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -9953,6 +9953,17 @@
"fieldFlag": "tenant-federation.max-concurrent",
"fieldType": "int",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "max_tenants",
"required": false,
"desc": "The max number of tenant IDs that may be supplied for a federated query if enabled. 0 to disable the limit.",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldFlag": "tenant-federation.max-tenants",
"fieldType": "int",
"fieldCategory": "experimental"
}
],
"fieldValue": null,
Expand Down
2 changes: 2 additions & 0 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -2733,6 +2733,8 @@ Usage of ./cmd/mimir/mimir:
If enabled on all services, queries can be federated across multiple tenants. The tenant IDs involved need to be specified separated by a '|' character in the 'X-Scope-OrgID' header.
-tenant-federation.max-concurrent int
[experimental] The number of workers used for each tenant federated query. This setting limits the maximum number of per-tenant queries executed at a time for a tenant federated query. (default 16)
-tenant-federation.max-tenants int
[experimental] The max number of tenant IDs that may be supplied for a federated query if enabled. 0 to disable the limit.
-timeseries-unmarshal-caching-optimization-enabled
[experimental] Enables optimized marshaling of timeseries. (default true)
-usage-stats.enabled
Expand Down
1 change: 1 addition & 0 deletions docs/sources/mimir/configure/about-versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ The following features are currently experimental:
- Use of Redis cache backend (`-query-frontend.results-cache.backend=redis`)
- Query blocking on a per-tenant basis (configured with the limit `blocked_queries`)
- Wait for the query-frontend to complete startup if a query request is received while it is starting up (`-query-frontend.not-running-timeout`)
- Max number of tenants that may be queried at once (`-tenant-federation.max-tenants`)
- Query-scheduler
- `-query-scheduler.querier-forget-delay`
- Store-gateway
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,11 @@ tenant_federation:
# CLI flag: -tenant-federation.max-concurrent
[max_concurrent: <int> | default = 16]

# (experimental) The max number of tenant IDs that may be supplied for a
# federated query if enabled. 0 to disable the limit.
# CLI flag: -tenant-federation.max-tenants
[max_tenants: <int> | default = 0]

activity_tracker:
# File where ongoing activities are stored. If empty, activity tracking is
# disabled.
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ require (
github.com/golang/snappy v0.0.4
github.com/google/gopacket v1.1.19
github.com/gorilla/mux v1.8.1
github.com/grafana/dskit v0.0.0-20231213223053-84f5540a28dd
github.com/grafana/dskit v0.0.0-20231219164408-2bfd67958535
github.com/grafana/e2e v0.1.1
github.com/hashicorp/golang-lru v1.0.2 // indirect
github.com/json-iterator/go v1.1.12
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -542,8 +542,8 @@ github.com/gosimple/slug v1.1.1 h1:fRu/digW+NMwBIP+RmviTK97Ho/bEj/C9swrCspN3D4=
github.com/gosimple/slug v1.1.1/go.mod h1:ER78kgg1Mv0NQGlXiDe57DpCyfbNywXXZ9mIorhxAf0=
github.com/grafana-tools/sdk v0.0.0-20220919052116-6562121319fc h1:PXZQA2WCxe85Tnn+WEvr8fDpfwibmEPgfgFEaC87G24=
github.com/grafana-tools/sdk v0.0.0-20220919052116-6562121319fc/go.mod h1:AHHlOEv1+GGQ3ktHMlhuTUwo3zljV3QJbC0+8o2kn+4=
github.com/grafana/dskit v0.0.0-20231213223053-84f5540a28dd h1:znSOnhAwMuyWzb1Vx550Q07o5p9EmqLyiDIYuKipiMc=
github.com/grafana/dskit v0.0.0-20231213223053-84f5540a28dd/go.mod h1:kkWM4WUV230bNG3urVRWPBnSJHs64y/0RmWjftnnn0c=
github.com/grafana/dskit v0.0.0-20231219164408-2bfd67958535 h1:qUdSymzUZ9bpNVcE79kTIW6oEB2J3oDLY7q82sHurjU=
github.com/grafana/dskit v0.0.0-20231219164408-2bfd67958535/go.mod h1:kkWM4WUV230bNG3urVRWPBnSJHs64y/0RmWjftnnn0c=
github.com/grafana/e2e v0.1.1 h1:/b6xcv5BtoBnx8cZnCiey9DbjEc8z7gXHO5edoeRYxc=
github.com/grafana/e2e v0.1.1/go.mod h1:RpNLgae5VT+BUHvPE+/zSypmOXKwEu4t+tnEMS1ATaE=
github.com/grafana/goautoneg v0.0.0-20231010094147-47ce5e72a9ae h1:Yxbw9jKGJVC6qAK5Ubzzb/qZwM6rRMMqaDc/d4Vp3pM=
Expand Down
10 changes: 8 additions & 2 deletions pkg/api/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ import (
"github.com/grafana/mimir/pkg/ingester/client"
"github.com/grafana/mimir/pkg/mimirpb"
"github.com/grafana/mimir/pkg/querier"
"github.com/grafana/mimir/pkg/querier/tenantfederation"
"github.com/grafana/mimir/pkg/ruler"
"github.com/grafana/mimir/pkg/scheduler"
"github.com/grafana/mimir/pkg/scheduler/schedulerpb"
Expand Down Expand Up @@ -89,7 +90,7 @@ type API struct {
indexPage *IndexPageContent
}

func New(cfg Config, serverCfg server.Config, s *server.Server, logger log.Logger) (*API, error) {
func New(cfg Config, federationCfg tenantfederation.Config, serverCfg server.Config, s *server.Server, logger log.Logger) (*API, error) {
// Ensure the encoded path is used. Required for the rules API
s.HTTP.UseEncodedPath()

Expand All @@ -113,10 +114,15 @@ func New(cfg Config, serverCfg server.Config, s *server.Server, logger log.Logge
}

// If no authentication middleware is present in the config, use the default authentication middleware.
if cfg.HTTPAuthMiddleware == nil {
if api.AuthMiddleware == nil {
api.AuthMiddleware = middleware.AuthenticateUser
}

// Unconditionally add middleware that ensures we only accept requests with an expected number of tenants
// that is applied after any existing auth middleware has run. Only a single tenant is allowed when federation
// is disabled. If federation is enabled, there is optionally a max number of tenants that is supported.
api.AuthMiddleware = middleware.Merge(api.AuthMiddleware, newTenantValidationMiddleware(federationCfg.Enabled, federationCfg.MaxTenants))

return api, nil
}

Expand Down
13 changes: 9 additions & 4 deletions pkg/api/api_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ import (
"github.com/grafana/dskit/server"
"github.com/stretchr/testify/require"

"github.com/grafana/mimir/pkg/querier/tenantfederation"
"github.com/grafana/mimir/pkg/util/gziphandler"
)

Expand All @@ -33,11 +34,12 @@ func TestNewApiWithoutSourceIPExtractor(t *testing.T) {
GRPCListenAddress: "localhost",
MetricsNamespace: "without_source_ip_extractor",
}
federationCfg := tenantfederation.Config{}
require.NoError(t, serverCfg.LogLevel.Set("info"))
srv, err := server.New(serverCfg)
require.NoError(t, err)

api, err := New(cfg, serverCfg, srv, &FakeLogger{})
api, err := New(cfg, federationCfg, serverCfg, srv, &FakeLogger{})
require.NoError(t, err)
require.Nil(t, api.sourceIPs)
}
Expand All @@ -50,11 +52,12 @@ func TestNewApiWithSourceIPExtractor(t *testing.T) {
GRPCListenAddress: "localhost",
MetricsNamespace: "with_source_ip_extractor",
}
federationCfg := tenantfederation.Config{}
require.NoError(t, serverCfg.LogLevel.Set("info"))
srv, err := server.New(serverCfg)
require.NoError(t, err)

api, err := New(cfg, serverCfg, srv, &FakeLogger{})
api, err := New(cfg, federationCfg, serverCfg, srv, &FakeLogger{})
require.NoError(t, err)
require.NotNil(t, api.sourceIPs)
}
Expand All @@ -70,21 +73,23 @@ func TestNewApiWithInvalidSourceIPExtractor(t *testing.T) {
LogSourceIPsRegex: "[*",
MetricsNamespace: "with_invalid_source_ip_extractor",
}
federationCfg := tenantfederation.Config{}

api, err := New(cfg, serverCfg, &s, &FakeLogger{})
api, err := New(cfg, federationCfg, serverCfg, &s, &FakeLogger{})
require.Error(t, err)
require.Nil(t, api)
}

func TestApiGzip(t *testing.T) {
cfg := Config{}
serverCfg := getServerConfig(t)
federationCfg := tenantfederation.Config{}
srv, err := server.New(serverCfg)
require.NoError(t, err)
go func() { _ = srv.Run() }()
t.Cleanup(srv.Stop)

api, err := New(cfg, serverCfg, srv, log.NewNopLogger())
api, err := New(cfg, federationCfg, serverCfg, srv, log.NewNopLogger())
require.NoError(t, err)

handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
Expand Down
46 changes: 46 additions & 0 deletions pkg/api/tenant.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
// SPDX-License-Identifier: AGPL-3.0-only

package api

import (
"fmt"
"net/http"

"github.com/grafana/dskit/middleware"
"github.com/grafana/dskit/tenant"
)

const (
tooManyTenantsTemplate = "too many tenant IDs present in the request. max: %d actual: %d"
)

// newTenantValidationMiddleware creates a new middleware that validates the number of tenants
// being accessed in a particular request is allowed given the current tenant federation configuration.
// Note that this middleware requires that tenant ID has been set on the request context by something
// like middleware.AuthenticateUser.
func newTenantValidationMiddleware(federation bool, maxTenants int) middleware.Interface {
return middleware.Func(func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()

ids, err := tenant.TenantIDs(ctx)
if err != nil {
http.Error(w, err.Error(), http.StatusUnauthorized)
return
}

numIds := len(ids)
if !federation && numIds > 1 {
http.Error(w, fmt.Sprintf(tooManyTenantsTemplate, 1, numIds), http.StatusUnprocessableEntity)
return
}

if federation && maxTenants > 0 && numIds > maxTenants {
http.Error(w, fmt.Sprintf(tooManyTenantsTemplate, maxTenants, numIds), http.StatusUnprocessableEntity)
return
}

next.ServeHTTP(w, r.WithContext(ctx))
})
})
}
113 changes: 113 additions & 0 deletions pkg/api/tenant_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
// SPDX-License-Identifier: AGPL-3.0-only

package api

import (
"io"
"net/http"
"net/http/httptest"
"strings"
"testing"

"github.com/grafana/dskit/middleware"
"github.com/grafana/dskit/tenant"
"github.com/grafana/dskit/user"
"github.com/stretchr/testify/require"
)

func TestNewTenantValidationMiddleware(t *testing.T) {
for _, tc := range []struct {
name string
federation bool
maxTenants int
header string
expectedHTTPStatus int
expectedBodyText string
}{
{
name: "federation disabled, invalid tenant header",
federation: false,
maxTenants: 0,
header: strings.Repeat("123", tenant.MaxTenantIDLength),
expectedHTTPStatus: 401,
expectedBodyText: "tenant ID is too long",
},
{
name: "federation disabled, single tenant",
federation: false,
maxTenants: 0,
header: "tenant-a",
expectedHTTPStatus: 200,
expectedBodyText: "",
},
{
name: "federation disabled, multiple tenants",
federation: false,
maxTenants: 0,
header: "tenant-a|tenant-b",
expectedHTTPStatus: 422,
expectedBodyText: "too many tenant IDs present",
},
{
name: "federation enabled, invalid tenant header",
federation: true,
maxTenants: 0,
header: strings.Repeat("123", tenant.MaxTenantIDLength),
expectedHTTPStatus: 401,
expectedBodyText: "tenant ID is too long",
},
{
name: "federation enabled, single tenant no limit",
federation: true,
maxTenants: 0,
header: "tenant-a",
expectedHTTPStatus: 200,
expectedBodyText: "",
},
{
name: "federation enabled, multiple tenants no limit",
federation: true,
maxTenants: 0,
header: "tenant-a|tenant-b|tenant-c",
expectedHTTPStatus: 200,
expectedBodyText: "",
},
{
name: "federation enabled, multiple tenants under limit",
federation: true,
maxTenants: 2,
header: "tenant-a|tenant-b",
expectedHTTPStatus: 200,
expectedBodyText: "",
},
{
name: "federation enabled, multiple tenants over limit",
federation: true,
maxTenants: 2,
header: "tenant-a|tenant-b|tenant-c",
expectedHTTPStatus: 422,
expectedBodyText: "too many tenant IDs present",
},
} {
t.Run(tc.name, func(t *testing.T) {
nop := http.HandlerFunc(func(writer http.ResponseWriter, request *http.Request) {})
// Note that we add the authentication middleware since the tenant validation middleware relies
// on tenant ID being set in the context associated with the request.
handler := middleware.Merge(middleware.AuthenticateUser, newTenantValidationMiddleware(tc.federation, tc.maxTenants)).Wrap(nop)

req := httptest.NewRequest("GET", "/", nil)
req.Header.Set(user.OrgIDHeaderName, tc.header)
resp := httptest.NewRecorder()

handler.ServeHTTP(resp, req)
body, err := io.ReadAll(resp.Body)

require.NoError(t, err)
require.Equal(t, tc.expectedHTTPStatus, resp.Code)

if tc.expectedBodyText != "" {
require.Contains(t, string(body), tc.expectedBodyText)
}
})
}
}
6 changes: 0 additions & 6 deletions pkg/frontend/querymiddleware/cardinality_query_cache_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,6 @@ import (
)

func TestCardinalityQueryCache_RoundTrip_WithTenantFederation(t *testing.T) {
// Enable tenant ID resolve used when tenant federation is enabled.
tenant.WithDefaultResolver(tenant.NewMultiResolver())
t.Cleanup(func() {
tenant.WithDefaultResolver(tenant.NewSingleResolver())
})

tests := map[string]struct {
tenantIDs []string
limits map[string]mockLimits
Expand Down
2 changes: 0 additions & 2 deletions pkg/frontend/querymiddleware/limits_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ import (
"time"

"github.com/go-kit/log"
"github.com/grafana/dskit/tenant"
"github.com/grafana/dskit/user"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/mock"
Expand Down Expand Up @@ -173,7 +172,6 @@ func TestLimitsMiddleware_MaxQueryExpressionSizeBytes(t *testing.T) {
End: util.TimeToMillis(now.Add(-time.Hour)),
}

tenant.WithDefaultResolver(tenant.NewMultiResolver())
limits := multiTenantMockLimits{
byTenant: map[string]mockLimits{
"test1": {maxQueryExpressionSizeBytes: testData.queryLimits["test1"]},
Expand Down
2 changes: 1 addition & 1 deletion pkg/frontend/querymiddleware/roundtrip.go
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ func newQueryTripperware(
newLimitsMiddleware(limits, log),
queryBlockerMiddleware,
newInstrumentMiddleware("step_align", metrics),
newStepAlignMiddleware(limits, tenant.NewMultiResolver(), log, registerer),
newStepAlignMiddleware(limits, log, registerer),
}

var c cache.Cache
Expand Down
6 changes: 4 additions & 2 deletions pkg/frontend/querymiddleware/roundtrip_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,8 @@ func TestRangeTripperware(t *testing.T) {
next: http.DefaultTransport,
}

tw, err := NewTripperware(Config{},
tw, err := NewTripperware(
Config{},
log.NewNopLogger(),
mockLimits{},
newTestPrometheusCodec(),
Expand Down Expand Up @@ -289,7 +290,8 @@ func TestTripperware_Metrics(t *testing.T) {
for testName, testData := range tests {
t.Run(testName, func(t *testing.T) {
reg := prometheus.NewPedanticRegistry()
tw, err := NewTripperware(Config{DeprecatedAlignQueriesWithStep: testData.stepAlignEnabled},
tw, err := NewTripperware(
Config{DeprecatedAlignQueriesWithStep: testData.stepAlignEnabled},
log.NewNopLogger(),
mockLimits{},
newTestPrometheusCodec(),
Expand Down
Loading

0 comments on commit 41fd54e

Please sign in to comment.