Skip to content

Commit

Permalink
Clamp GOMAXPROCS when higher than runtime.NumCPU (#8201)
Browse files Browse the repository at this point in the history
* Clamp GOMAXPROCS when higher than runtime.NumCPU

#### Background

We are trying to automatically set GOMAXPROCS based on the number of CPUs that an ingester pod requests in Kubernetes. We're going with 2x the requested cores. The reason for this is that the default values of GOMAXPROCS is NumCPU. When running on large nodes and only utilizing a small % of the underlying node results in high scheduling overhead.

#### Problem

Sometimes the setting of GOMAXPROCS might exceed the number of cores of the node. We also don't want to restrict the nodes on which pods run. In those cases setting GOMAXPROCS to a higher value than NumCPU has the opposite effect - it increases scheduling overhead instead of reducing it.

The idea of this PR is to basically make automating the GOMAXPROCS setting in deployment tooling easier by having some support from the code.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

* Add CHANGELOG.md entry

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

---------

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
  • Loading branch information
dimitarvdimitrov committed May 29, 2024
1 parent 6db3385 commit 3803a60
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
* Query results caching should be more stable as all equivalent queries receive the same cache key, but there may be cache churn on first deploy with the updated format
* Query blocking can no longer be circumvented with an equivalent query in a different format; see [Configure queries to block](https://grafana.com/docs/mimir/latest/configure/configure-blocked-queries/)
* [CHANGE] Query-frontend: stop using `-validation.create-grace-period` to clamp how far into the future a query can span.
* [CHANGE] Clamp [`GOMAXPROCS`](https://pkg.go.dev/runtime#GOMAXPROCS) to [`runtime.NumCPU`](https://pkg.go.dev/runtime#NumCPU). #8201
* [FEATURE] Continuous-test: now runable as a module with `mimir -target=continuous-test`. #7747
* [FEATURE] Store-gateway: Allow specific tenants to be enabled or disabled via `-store-gateway.enabled-tenants` or `-store-gateway.disabled-tenants` CLI flags or their corresponding YAML settings. #7653
* [FEATURE] New `-<prefix>.s3.bucket-lookup-type` flag configures lookup style type, used to access bucket in s3 compatible providers. #7684
Expand Down
13 changes: 13 additions & 0 deletions cmd/mimir/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,7 @@ func main() {
if mainFlags.blockProfileRate > 0 {
runtime.SetBlockProfileRate(mainFlags.blockProfileRate)
}
clampGOMAXPROCS()

reg := prometheus.DefaultRegisterer
cfg.Server.Log = util_log.InitLogger(cfg.Server.LogFormat, cfg.Server.LogLevel, mainFlags.useBufferedLogger, util_log.RateLimitedLoggerCfg{
Expand Down Expand Up @@ -231,6 +232,18 @@ func main() {
util_log.CheckFatal("running application", err)
}

func clampGOMAXPROCS() {
if runtime.GOMAXPROCS(0) <= runtime.NumCPU() {
return
}
level.Warn(util_log.Logger).Log(
"msg", "GOMAXPROCS is higher than the number of CPUs; clamping it to NumCPU; please report if this doesn't fit your use case",
"GOMAXPROCS", runtime.GOMAXPROCS(0),
"NumCPU", runtime.NumCPU(),
)
runtime.GOMAXPROCS(runtime.NumCPU())
}

func exit(code int) {
if err := util_log.Flush(); err != nil {
fmt.Fprintln(os.Stderr, "Could not flush logger", err)
Expand Down

0 comments on commit 3803a60

Please sign in to comment.