runtime: performance degradation on tip on high core count machines #67858

JacobOaks · 2024-06-06T18:04:33Z

Go version

go version go1.22.4 linux/amd64

Output of `go env` in your module/workspace:

GO111MODULE='on'
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/user/.cache/go-build'
GOENV='/home/user/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/user/go-repos/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/user/go-repos:/opt/go/path:/home/user/go-code'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/user/go/src/github.com/golang/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/user/go/src/github.com/golang/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.4'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/user/go/src/github.com/uber-go/zap/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1740765925=/tmp/go-build -gno-record-gcc-switches'

What did you do?

We have been doing some performance testing of Go tip at Uber in preparation for Go 1.23.

What did you see happen?

We have noticed degradation in linux machines with a lot of cores (96) in all of Zap’s Field logging benchmark tests of around 8%. These benchmarks look something like this:

logger := New(
	zapcore.NewCore(
		zapcore.NewJSONEncoder(NewProductionConfig().EncoderConfig),
		&ztest.Discarder{}, // No actual i/o, logs get discarded.
		DebugLevel,
	),
)
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
	log.Info("Boolean.", Bool("foo", true))
}

We don’t have an isolated linux environment available to us, so these results are susceptible to a slight noisy neighbor problem, but we have consistently seen some amount of degradation on these benchmarks:

$ go version
go version go1.22.4 linux/amd64
$ go test -bench Field -run nounittests -count 25 . | tee go1224.log
$ ~/go/src/github.com/golang/go4/bin/go version
go version devel go1.23-93bbf719a6 Wed Jun 5 17:30:16 2024 +0000 linux/amd64
$ go test -bench Field -run nounittests -count 25 . | tee 93bbf719a6.log
$ benchstat go1224.log 93bbf719a6.log

goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
                   │ go1.22.4.log │           93bbf719a6.log            │
                   │    sec/op    │   sec/op     vs base                │
BoolField-96          110.6n ± 6%   127.9n ± 8%  +15.64% (p=0.001 n=25)
ByteStringField-96    139.7n ± 1%   149.9n ± 2%   +7.30% (p=0.000 n=25)
Float64Field-96       112.2n ± 4%   125.2n ± 4%  +11.59% (p=0.000 n=25)
IntField-96           108.7n ± 3%   116.2n ± 2%   +6.90% (p=0.000 n=25)
Int64Field-96         105.9n ± 4%   113.2n ± 2%   +6.89% (p=0.009 n=25)
StringField-96        104.4n ± 2%   115.4n ± 4%  +10.54% (p=0.000 n=25)
StringerField-96      105.4n ± 3%   115.5n ± 4%   +9.58% (p=0.000 n=25)
TimeField-96          109.6n ± 2%   117.4n ± 2%   +7.12% (p=0.000 n=25)
DurationField-96      111.6n ± 3%   121.9n ± 3%   +9.23% (p=0.000 n=25)
ErrorField-96         108.4n ± 2%   115.7n ± 4%   +6.73% (p=0.000 n=25)
ErrorsField-96        184.1n ± 2%   205.1n ± 4%  +11.41% (p=0.000 n=25)
StackField-96         713.0n ± 3%   813.3n ± 3%  +14.07% (p=0.000 n=25)
ObjectField-96        117.2n ± 2%   130.9n ± 3%  +11.69% (p=0.000 n=25)
ReflectField-96       317.6n ± 2%   346.0n ± 3%   +8.94% (p=0.000 n=25)
10Fields-96           584.7n ± 2%   622.4n ± 4%   +6.45% (p=0.000 n=25)
100Fields-96          5.919µ ± 3%   5.630µ ± 5%        ~ (p=0.073 n=25)
geomean               196.5n        213.4n        +8.61%

We fiddled with GOMAXPROCS a bit and noticed the degradation is definitely related to parallelism.

We didn’t see a whole lot in CPU profiles other than a general increase of about 2-4% of samples taken in the runtime package.

We were able to use git bisect to identify e995aa95cb5f379c1df5d5511ee09970261d877f as one cause. Specifically, the added calls to nanotime() seem to cause degradation in these highly parallelized benchmarks. However, this commit alone does not seem to account for the entire degradation:

$ ~/go/src/github.com/golang/go3/bin/go version
go version devel go1.23-e995aa95cb Mon Apr 8 21:43:16 2024 +0000 linux/amd64
$ ~/go/src/github.com/golang/go3/bin/go test -bench Field -run nounittests -count 25 . | tee e995aa95cb.log
$ benchstat go1224.log e995aa95cb.log
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
                   │ go1.22.4.log │            e995aa95cb.log            │
                   │    sec/op    │    sec/op     vs base                │
BoolField-96          110.6n ± 6%   121.1n ±  6%   +9.49% (p=0.004 n=25)
ByteStringField-96    139.7n ± 1%   145.9n ±  2%   +4.44% (p=0.002 n=25)
Float64Field-96       112.2n ± 4%   121.1n ±  1%   +7.93% (p=0.000 n=25)
IntField-96           108.7n ± 3%   112.5n ±  2%   +3.50% (p=0.009 n=25)
Int64Field-96         105.9n ± 4%   111.4n ±  3%        ~ (p=0.200 n=25)
StringField-96        104.4n ± 2%   111.5n ±  2%   +6.80% (p=0.000 n=25)
StringerField-96      105.4n ± 3%   113.4n ±  3%   +7.59% (p=0.000 n=25)
TimeField-96          109.6n ± 2%   117.6n ±  2%   +7.30% (p=0.000 n=25)
DurationField-96      111.6n ± 3%   116.8n ±  2%   +4.66% (p=0.000 n=25)
ErrorField-96         108.4n ± 2%   113.7n ±  2%   +4.89% (p=0.002 n=25)
ErrorsField-96        184.1n ± 2%   201.7n ±  4%   +9.56% (p=0.000 n=25)
StackField-96         713.0n ± 3%   770.9n ±  2%   +8.12% (p=0.000 n=25)
ObjectField-96        117.2n ± 2%   127.2n ±  3%   +8.53% (p=0.000 n=25)
ReflectField-96       317.6n ± 2%   349.4n ±  5%  +10.01% (p=0.000 n=25)
10Fields-96           584.7n ± 2%   620.5n ±  5%   +6.12% (p=0.005 n=25)
100Fields-96          5.919µ ± 3%   6.046µ ± 25%        ~ (p=0.064 n=25)
geomean               196.5n        209.5n         +6.62%

We weren’t able to reliably identify any additional commits beyond this one that accounted for more of the degradation.

Note: this is not a duplicate of #67857, but rather an investigation of different Zap benchmark degradations.

What did you expect to see?

No practical degradation.

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2024-06-07T00:43:23Z

CC @golang/runtime

mknyszek · 2024-06-07T03:34:54Z

These logging benchmarks appear to involve incredibly tight loops, which leads me to believe that it's possibly due to unfortunate microarchitectural effects -- see my comment on #67857.

And my next question would be: does this in any way correspond to production-level regressions? Unless the world is getting stopped a lot I would not expect 15*GOMAXPROCS cpu-ns of CPU time have any meaningful effect on performance in any real program (referencing your bisection result). I could see it having an effect in a microbenchmark that stops the world a lot, though.

One thing I might suggest trying is setting GOMEMLIMIT=<something big> and GOGC=off in your benchmarks to try and eliminate or minimize the GC cost of your benchmark (allocation cost will still be measured, though, which I assume you care about). The reason I say this is because microbenchmarks can end up being a torture test for the GC in a way that makes the benchmark less useful. In other words, the costs dominating your benchmark do not reflect the actual costs of your code in real-world contexts.

For example, a benchmark that allocates in a loop, immediately drops the memory, and has a teeny tiny live heap (a lot like most microbenchmarks!), is going to see a very, very high number of GC cycles because it is going to sit at the minimum total heap size of 4 MiB. This means tons of STWs and CPU time spent on mark work that does not translate at all to real production systems with sizable live heaps. The tricky part with the GC is that its frequency (and thus its cost) is proportional to your live heap via GOGC, so considering the GC as "part of the benchmark" is really hard to do in a useful way. It's much easier if the benchmark measures the performance of some larger end-to-end system.

I'm curious to know if setting GOMEMLIMIT to something big and GOGC=off makes the problem go away completely. If it does, I am less concerned about these regressions. If it doesn't, then I think this becomes much more important to fix. Also, if this regression shows up in more end-to-end benchmarks, or in production, that's much more cause for alarm.

I'll also add, are you able to compare profiles before and after (pprof -diff_base=old.prof new.prof)? Even an 8% regression should stick our like a sore thumb.

JacobOaks · 2024-06-07T18:14:52Z

Hey @mknyszek - thanks for the quick response & ideas!

Your explanation w.r.t microbenchmarks being a GC torture test makes sense - these tests are allocating memory that quickly becomes dead - which leads to a situation with consistent allocations but low target memory, causing GC to get triggered often. I suppose this would exacerbate any GC performance degradations that are otherwise minuscule in larger benchmarks/applications.

It's unfortunately not realistic for us to actually test Go tip in production - but we may be able to do this once an rc version is tagged.

I did turn off GC and re-ran these benchmarks, and as you predicted, this does seem to result in no degradation between the versions. Like I mentioned before, we had a hard time discerning any differences in profiles between the two versions, but I tried to make the issue worse by forcing more GC with GOGC=50 in hopes that differences between the profiles would surface better, and did find that gcBgMarkWorker saw about a 3% increase in CPU time with this - which seems in line with your explanation. There weren't any other significant differences that I could tell (happy to share the profiles if it helps).

mknyszek · 2024-06-07T19:16:57Z

Hey @mknyszek - thanks for the quick response & ideas!

No problem. These reports are important. :) I hope my response did not discourage future reports. Thanks for the thorough issue you filed -- it's genuinely helpful that y'all do performance testing against tip.

Your explanation w.r.t microbenchmarks being a GC torture test makes sense - these tests are allocating memory that quickly becomes dead - which leads to a situation with consistent allocations but low target memory, causing GC to get triggered often. I suppose this would exacerbate any GC performance degradations that are otherwise minuscule in larger benchmarks/applications.

Yeah, it's a bit unfortunate how easy it is to generate benchmarks that do this.

It's unfortunately not realistic for us to actually test Go tip in production - but we may be able to do this once an rc version is tagged.

Understandable. Hopefully the RC gives us some more feedback.

I did turn off GC and re-ran these benchmarks, and as you predicted, this does seem to result in no degradation between the versions. Like I mentioned before, we had a hard time discerning any differences in profiles between the two versions, but I tried to make the issue worse by forcing more GC with GOGC=50 in hopes that differences between the profiles would surface better, and did find that gcBgMarkWorker saw about a 3% increase in CPU time with this - which seems in line with your explanation. There weren't any other significant differences that I could tell (happy to share the profiles if it helps).

Got it, that's good to know! Thanks for checking.

we had a hard time discerning any differences in profiles between the two versions

That's unfortunate; just to clarify, that's even with the automated diffing? The only other thing I might suggest is doing the same kind of differential profile with Linux perf if you can, which tends to produce more accurate results overall.

I'm going to leave this issue open for now while I investigate #67822, in case it becomes relevant.

gabyhelp · 2024-06-08T03:20:59Z

Similar Issues

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

thanm · 2024-06-12T12:15:11Z

I ran these benchmarks on my high-core 2-socket machine, with and without link-time randomization (as described in this note: #67822 (comment)).

Here's what I see with no randomization; rock steady:

goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: Intel(R) Xeon(R) CPU @ 2.00GHz
                   │ out.base.txt │             out.tip.txt             │
                   │    sec/op    │   sec/op     vs base                │
BoolField-96         116.1n ±  5%   127.7n ± 3%   +9.99% (p=0.000 n=30)
ByteStringField-96   135.2n ±  5%   143.5n ± 8%   +6.18% (p=0.001 n=30)
Float64Field-96      118.6n ±  4%   128.1n ± 5%   +8.06% (p=0.000 n=30)
IntField-96          104.6n ±  3%   114.9n ± 2%   +9.85% (p=0.000 n=30)
Int64Field-96        107.0n ±  2%   116.7n ± 2%   +9.12% (p=0.000 n=30)
StringField-96       104.7n ±  1%   115.4n ± 4%  +10.22% (p=0.000 n=30)
StringerField-96     107.5n ±  2%   114.1n ± 5%   +6.19% (p=0.000 n=30)
TimeField-96         107.6n ±  4%   118.3n ± 3%   +9.90% (p=0.000 n=30)
DurationField-96     110.9n ±  2%   120.4n ± 3%   +8.66% (p=0.000 n=30)
ErrorField-96        104.2n ±  2%   115.6n ± 2%  +10.94% (p=0.000 n=30)
ErrorsField-96       198.9n ±  3%   211.7n ± 2%   +6.41% (p=0.000 n=30)
StackField-96        673.3n ±  7%   747.2n ± 3%  +10.97% (p=0.000 n=30)
ObjectField-96       131.1n ±  1%   145.5n ± 2%  +10.98% (p=0.000 n=30)
ReflectField-96      265.9n ±  1%   297.7n ± 5%  +11.98% (p=0.000 n=30)
10Fields-96          560.2n ± 11%   576.4n ± 4%        ~ (p=0.090 n=30)
100Fields-96         8.545µ ±  1%   8.635µ ± 1%   +1.05% (p=0.003 n=30)
geomean              199.9n         216.5n        +8.29%

Now here's what happens when I add in text layout randomization (10 instances of -count=3 runs, each with a random seed):

goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: Intel(R) Xeon(R) CPU @ 2.00GHz
                   │ rout.base.txt │             rout.tip.txt             │
                   │    sec/op     │    sec/op     vs base                │
BoolField-96          127.3n ±  8%   131.7n ±  7%        ~ (p=0.301 n=30)
ByteStringField-96    155.5n ±  8%   146.8n ±  5%        ~ (p=0.506 n=30)
Float64Field-96       131.2n ±  9%   131.4n ±  3%        ~ (p=0.762 n=30)
IntField-96           119.5n ±  6%   114.9n ±  4%        ~ (p=0.636 n=30)
Int64Field-96         120.1n ±  6%   119.7n ±  4%        ~ (p=0.530 n=30)
StringField-96        115.0n ±  9%   120.1n ±  3%        ~ (p=0.124 n=30)
StringerField-96      112.2n ±  9%   117.9n ±  3%   +5.13% (p=0.044 n=30)
TimeField-96          115.8n ± 10%   128.4n ±  7%  +10.88% (p=0.004 n=30)
DurationField-96      124.2n ±  4%   123.0n ±  3%        ~ (p=0.605 n=30)
ErrorField-96         120.5n ±  8%   116.8n ± 10%        ~ (p=0.231 n=30)
ErrorsField-96        210.8n ±  4%   224.1n ±  3%   +6.36% (p=0.001 n=30)
StackField-96         748.4n ±  6%   779.4n ±  4%        ~ (p=0.156 n=30)
ObjectField-96        144.7n ±  3%   146.7n ±  5%        ~ (p=0.067 n=30)
ReflectField-96       324.1n ±  4%   314.8n ±  6%        ~ (p=0.807 n=30)
10Fields-96           578.1n ±  5%   580.1n ±  5%        ~ (p=0.894 n=30)
100Fields-96          8.917µ ±  1%   8.771µ ±  1%   -1.64% (p=0.002 n=30)
geomean               220.7n         223.0n         +1.03%

Note the p values, which are now all over the map. So I strongly suspect that in addition to the GC artifacts mentioned by @mknyszek, some additional fraction of the apparent degradation is due to loop alignment (e.g. just happening to get the "right" alignment in one case and the "wrong" alignment in another).

JacobOaks · 2024-06-13T14:17:51Z

some additional fraction of the apparent degradation is due to loop alignment

Hmm... FWIW I wasn't able to reproduce this using the script from #67822 (comment), including the same 10 seeds. Perhaps I'm doing something wrong.

No linker randomization:

goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
                   │ out.go1.22.txt │             out.tip.txt             │
                   │     sec/op     │   sec/op     vs base                │
BoolField-96            112.3n ± 3%   126.5n ± 4%  +12.60% (p=0.000 n=30)
ByteStringField-96      141.2n ± 3%   150.8n ± 2%   +6.83% (p=0.000 n=30)
Float64Field-96         115.6n ± 2%   127.2n ± 2%  +10.03% (p=0.000 n=30)
IntField-96             107.4n ± 2%   116.2n ± 1%   +8.14% (p=0.000 n=30)
Int64Field-96           106.2n ± 2%   116.0n ± 2%   +9.33% (p=0.000 n=30)
StringField-96          105.6n ± 1%   115.5n ± 1%   +9.38% (p=0.000 n=30)
StringerField-96        107.4n ± 2%   113.5n ± 2%   +5.68% (p=0.000 n=30)
TimeField-96            110.0n ± 2%   119.6n ± 1%   +8.73% (p=0.000 n=30)
DurationField-96        110.5n ± 1%   122.2n ± 1%  +10.54% (p=0.000 n=30)
ErrorField-96           107.8n ± 3%   115.9n ± 2%   +7.56% (p=0.000 n=30)
ErrorsField-96          188.1n ± 2%   209.0n ± 3%  +11.17% (p=0.000 n=30)
StackField-96           725.4n ± 2%   788.0n ± 1%   +8.64% (p=0.000 n=30)
ObjectField-96          120.1n ± 1%   129.1n ± 3%   +7.54% (p=0.000 n=30)
ReflectField-96         326.8n ± 1%   350.2n ± 1%   +7.16% (p=0.000 n=30)
10Fields-96             597.5n ± 2%   594.0n ± 4%        ~ (p=0.906 n=30)
100Fields-96            5.597µ ± 1%   5.667µ ± 5%        ~ (p=0.623 n=30)
geomean                 198.0n        213.2n        +7.70%

Linker randomization:

goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
                   │ rout.go1.22.txt │            rout.tip.txt             │
                   │     sec/op      │   sec/op     vs base                │
BoolField-96             116.4n ± 2%   132.6n ± 3%  +13.87% (p=0.000 n=30)
ByteStringField-96       143.0n ± 2%   152.6n ± 2%   +6.64% (p=0.000 n=30)
Float64Field-96          115.5n ± 1%   128.0n ± 2%  +10.77% (p=0.000 n=30)
IntField-96              108.3n ± 1%   116.9n ± 3%   +7.94% (p=0.000 n=30)
Int64Field-96            107.4n ± 2%   117.6n ± 2%   +9.45% (p=0.000 n=30)
StringField-96           107.2n ± 2%   117.0n ± 2%   +9.14% (p=0.000 n=30)
StringerField-96         108.4n ± 2%   117.8n ± 2%   +8.68% (p=0.000 n=30)
TimeField-96             111.2n ± 1%   121.4n ± 1%   +9.22% (p=0.000 n=30)
DurationField-96         113.8n ± 3%   124.6n ± 2%   +9.44% (p=0.000 n=30)
ErrorField-96            108.5n ± 3%   119.7n ± 1%  +10.32% (p=0.000 n=30)
ErrorsField-96           186.7n ± 2%   208.5n ± 1%  +11.70% (p=0.000 n=30)
StackField-96            720.4n ± 1%   805.6n ± 3%  +11.82% (p=0.000 n=30)
ObjectField-96           120.2n ± 2%   132.0n ± 2%   +9.78% (p=0.000 n=30)
ReflectField-96          329.4n ± 2%   357.0n ± 3%   +8.39% (p=0.000 n=30)
10Fields-96              593.3n ± 2%   618.2n ± 2%   +4.20% (p=0.001 n=30)
100Fields-96             5.690µ ± 4%   5.744µ ± 2%        ~ (p=0.935 n=30)
geomean                  199.7n        217.4n        +8.85%

I did sanity check go was being invoked correctly.

+ go version
go version go1.22.4 linux/amd64
+ go test -ldflags=-randlayout=17017 -count=3 -run=nothing -bench=Field
...
+ go version
go version devel go1.23-97bc577812 Wed Jun 12 18:56:34 2024 +0000 linux/amd64
+ go test -ldflags=-randlayout=15015 -count=3 -run=nothing -bench=Field

very strange.

thanm · 2024-06-13T18:06:50Z

Interesting -- looks like machine type (microarchitecture) matters a lot here, which I suppose I should have expected all along. I don't have access to a 2-socket (96 core) AMD EPYC 7B13, but I do have a single-socket AMD EPYC 7B13 and I can see the degradation hold up just as you mention. I get roughly this

goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
                   │ out.base.txt │            out.tip.txt             │
                   │    sec/op    │   sec/op     vs base               │
BoolField-48          110.4n ± 1%   115.5n ± 2%  +4.57% (p=0.000 n=30)
ByteStringField-48    136.9n ± 1%   145.3n ± 2%  +6.17% (p=0.000 n=30)
Float64Field-48       119.8n ± 1%   130.6n ± 0%  +8.97% (p=0.000 n=30)
IntField-48           110.4n ± 1%   118.3n ± 1%  +7.16% (p=0.000 n=30)
Int64Field-48         112.3n ± 1%   119.8n ± 1%  +6.73% (p=0.000 n=30)
StringField-48        108.5n ± 1%   117.4n ± 1%  +8.25% (p=0.000 n=30)
StringerField-48      109.8n ± 0%   117.9n ± 2%  +7.38% (p=0.000 n=30)
TimeField-48          112.2n ± 1%   121.5n ± 1%  +8.34% (p=0.000 n=30)
DurationField-48      116.5n ± 1%   125.4n ± 1%  +7.64% (p=0.000 n=30)
ErrorField-48         111.8n ± 1%   119.1n ± 2%  +6.58% (p=0.000 n=30)
ErrorsField-48        188.0n ± 1%   199.3n ± 1%  +6.04% (p=0.000 n=30)
StackField-48         681.9n ± 1%   735.7n ± 1%  +7.90% (p=0.000 n=30)
ObjectField-48        131.0n ± 1%   139.2n ± 1%  +6.26% (p=0.000 n=30)
ReflectField-48       299.4n ± 0%   319.5n ± 1%  +6.75% (p=0.000 n=30)
10Fields-48           459.0n ± 1%   488.5n ± 1%  +6.44% (p=0.000 n=30)
100Fields-48          5.114µ ± 0%   5.183µ ± 0%  +1.35% (p=0.000 n=30)
geomean               195.6n        208.6n       +6.64%

both with and without layout randomization. Thanks.

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jun 6, 2024

JacobOaks mentioned this issue Jun 6, 2024

runtime: general slowdown relative from 1.22 to tip (1.23 candidate) #67822

Closed

mknyszek self-assigned this Jun 7, 2024

mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 7, 2024

mknyszek added this to the Go1.23 milestone Jun 7, 2024

gabyhelp mentioned this issue Jun 8, 2024

runtime: performance degradation on tip that disappears when traced/profiled #67857

Open

gabyhelp mentioned this issue Jun 22, 2024

cmd/compile: large type switches that always match the first value are slower in Go 1.22 #68125

Open

gopherbot modified the milestones: Go1.23, Go1.24 Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: performance degradation on tip on high core count machines #67858

runtime: performance degradation on tip on high core count machines #67858

JacobOaks commented Jun 6, 2024 •

edited

Loading

ianlancetaylor commented Jun 7, 2024

mknyszek commented Jun 7, 2024 •

edited

Loading

JacobOaks commented Jun 7, 2024

mknyszek commented Jun 7, 2024 •

edited

Loading

gabyhelp commented Jun 8, 2024

thanm commented Jun 12, 2024

JacobOaks commented Jun 13, 2024 •

edited

Loading

thanm commented Jun 13, 2024

runtime: performance degradation on tip on high core count machines #67858

runtime: performance degradation on tip on high core count machines #67858

Comments

JacobOaks commented Jun 6, 2024 • edited Loading

Go version

Output of go env in your module/workspace:

What did you do?

What did you see happen?

What did you expect to see?

ianlancetaylor commented Jun 7, 2024

mknyszek commented Jun 7, 2024 • edited Loading

JacobOaks commented Jun 7, 2024

mknyszek commented Jun 7, 2024 • edited Loading

gabyhelp commented Jun 8, 2024

thanm commented Jun 12, 2024

JacobOaks commented Jun 13, 2024 • edited Loading

thanm commented Jun 13, 2024

JacobOaks commented Jun 6, 2024 •

edited

Loading

Output of `go env` in your module/workspace:

mknyszek commented Jun 7, 2024 •

edited

Loading

mknyszek commented Jun 7, 2024 •

edited

Loading

JacobOaks commented Jun 13, 2024 •

edited

Loading