-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proto: performance degradation with v1.1.0 #624
Comments
Heya -- Is there any way I could help you with this? 😃 |
Could you add a LICENSE file? |
Alternatively, could you run both benchmarks with |
@LMMilewski ✔️ Added a LICENSE. |
It seems that the problem is with processing maps (see profiles below). Your proto contains a large, deeply nested map ("node" key in https://github.com/srenatus/pb-bench/blob/master/data/converge-success-report.json is a google.protobuf.Struct which is a map that can contain nested maps https://github.com/golang/protobuf/blob/master/ptypes/struct/struct.proto#L52-L55). Maps processing is expected to be slow. That path was considered rare and uses reflection extensively. That was true in 1.0.0 too but that version didn't reflectively iterate over maps to calculate their sizes (which is what shows up in the profile). Also, calculating size is at the top of the v1.1.0 profile accounting for 65.36%. Perhaps it is quadratic in some case (probably maps) which would happen if it's not cached. v1.0.0 top
v1.1.0 top
|
@cherryyz Any idea if we can make Size faster in this case? |
FWIW. I created a small benchmark (code below) with nested maps (depth 1, 2, 4, 8, 16). Every map has a single key and the value is either a nested map or a number (the leaf). I got those results:
So it grows faster than it should. The web graph shows that size is driven by map and oneof. I'm not sure how to attach an svg here though. Code:
|
Yeah, map is the rare slow case. It all uses reflection. And it allocates a lot. I just tried to write a specialized version just for string-message maps. There is good speed up. But in practice we would need NxM key-value type combinations. Given that maps are considered rare, I'm not sure we want to go this way. While writing it, I did find a quadratic behavior in the reflection-based code. We call
|
I guess it depends on the speedup you can get but (without pushing for / against the specialization) I want to note that string->message is most likely the only case where "maps are rare" doesn't hold. It's popular because the protobuf spec provides a way to represent arbitrary json as protos. That leads to arbitrary string->anything maps ("anything" is represented as a message with a single oneof). Apparently it's used even in Google cloud APIs (I don't remember which ones though). That's also what OP's benchmark does. |
Here's a crude histogram of map types across a large corpus:
Since this was done purely by parsing, I couldn't tell if |
This is comparing specialized string-message map (new) vs. current implementation with my CL above applied (old). I haven't implemented deterministic marshal. In the deterministic case the speedup would be smaller, as we need to allocate a slice of keys and sort it. Specialization only works with unsafe-based implementation. We can't specialized arbitrary message type in pure reflection code. We could specialize only string-message maps, especially if this is documented as "special" in some ways (I just don't want to draw the line too arbitrarily). Or we could specialize all string-T maps (if there is a good reason). |
If the value of a map is a message with nested maps, calling valSizer in marshal may be quadratic -- it will call size at each level. Fix by using cached size if the value type is message. Benchmark results with @LMMilewski's test case from #624: name old time/op new time/op delta 1-12 1.96µs ±19% 1.76µs ±18% ~ (p=0.075 n=10+10) 2-12 5.10µs ± 6% 3.84µs ± 8% -24.69% (p=0.000 n=9+10) 4-12 14.2µs ±12% 8.2µs ±15% -42.77% (p=0.000 n=10+10) 8-12 43.0µs ±10% 15.6µs ±11% -63.81% (p=0.000 n=9+10) 16-12 138µs ±12% 33µs ±17% -76.32% (p=0.000 n=10+10) name old alloc/op new alloc/op delta 1-12 376B ± 0% 376B ± 0% ~ (all equal) 2-12 928B ± 0% 752B ± 0% -18.97% (p=0.000 n=10+10) 4-12 2.54kB ± 0% 1.49kB ± 0% -41.51% (p=0.000 n=10+10) 8-12 7.89kB ± 0% 2.96kB ± 0% -62.47% (p=0.000 n=10+10) 16-12 27.0kB ± 0% 5.9kB ± 0% -78.11% (p=0.000 n=10+10) name old allocs/op new allocs/op delta 1-12 12.0 ± 0% 12.0 ± 0% ~ (all equal) 2-12 28.0 ± 0% 23.0 ± 0% -17.86% (p=0.000 n=10+10) 4-12 75.0 ± 0% 45.0 ± 0% -40.00% (p=0.000 n=10+10) 8-12 229 ± 0% 89 ± 0% -61.14% (p=0.000 n=10+10) 16-12 777 ± 0% 177 ± 0% -77.22% (p=0.000 n=10+10) Fixes #624.
Thanks for addressing this. I've tried to re-run my aforementioned benchmark with the latest revision (1325a0), but it doesn't seem to compile anymore: $ make -f ../Makefile bench
go test -v -count=10 -benchmem -bench .
panic: reflect.Set: value of type map[string]*structpb.Value is not assignable to type map[string]*structpb.Value
goroutine 49 [running]:
reflect.Value.assignTo(0x11e8c60, 0xc420092d50, 0x15, 0x1226774, 0xb, 0x11e8d20, 0x0, 0x100728d, 0x11e8c60, 0xc420092d50)
/usr/local/Cellar/go/1.10/libexec/src/reflect/value.go:2235 +0x427
reflect.Value.Set(0x11e8d20, 0xc420092cf0, 0x195, 0x11e8c60, 0xc420092d50, 0x15)
/usr/local/Cellar/go/1.10/libexec/src/reflect/value.go:1373 +0xa4
github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb.(*Unmarshaler).unmarshalValue(0xc4200aa020, 0x12056e0, 0xc420092cf0, 0x199, 0xc4201f0000, 0x63, 0x70, 0xc4201c7740, 0x1119abb, 0xc4200aa8a0)
/Users/stephan/go/src/github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb/jsonpb.go:822 +0x2461
github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb.(*Unmarshaler).unmarshalValue(0xc4200aa020, 0x120cfc0, 0xc4201ee030, 0x196, 0xc4201f0000, 0x63, 0x70, 0xc4201c7740, 0x11d60f4, 0x38)
/Users/stephan/go/src/github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb/jsonpb.go:713 +0x2a6
github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb.(*Unmarshaler).unmarshalValue(0xc4200aa020, 0x121b860, 0xc4201ee000, 0x199, 0xc4201e2000, 0x1a7, 0x1c0, 0xc4201c7440, 0x1001d98, 0xc4201de578)
/Users/stephan/go/src/github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb/jsonpb.go:934 +0x7d3
github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb.(*Unmarshaler).unmarshalValue(0xc4200aa020, 0x12197c0, 0xc4200a6910, 0x196, 0xc4201e2000, 0x1a7, 0x1c0, 0xc4201c7440, 0x0, 0x0)
/Users/stephan/go/src/github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb/jsonpb.go:713 +0x2a6
github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb.(*Unmarshaler).unmarshalValue(0xc4200aa020, 0x11d03a0, 0xc4200fecd0, 0x197, 0xc4201ba000, 0x15ac, 0x1800, 0xc4201c7440, 0x11e1e09, 0x41)
/Users/stephan/go/src/github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb/jsonpb.go:994 +0x1d54
github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb.(*Unmarshaler).unmarshalValue(0xc4200aa020, 0x1220660, 0xc4200fec00, 0x199, 0xc42017c000, 0x19565, 0x1a000, 0x0, 0x1215460, 0x1050f01)
/Users/stephan/go/src/github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb/jsonpb.go:934 +0x7d3
github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb.(*Unmarshaler).UnmarshalNext(0xc4200aa020, 0xc4200b8000, 0x124b880, 0xc4200fec00, 0x19501, 0xc4200aa080)
/Users/stephan/go/src/github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb/jsonpb.go:663 +0x13d
github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb.(*Unmarshaler).Unmarshal(0xc4200aa020, 0x124ad00, 0xc4200aa080, 0x124b880, 0xc4200fec00, 0x19566, 0x0)
/Users/stephan/go/src/github.com/srenatus/pb-bench/r-1325a0/vendor/github.com/golang/protobuf/jsonpb/jsonpb.go:674 +0x7c
github.com/srenatus/pb-bench/r-1325a0.getRunMsg(0xc4200fe900, 0x102e8bd)
/Users/stephan/go/src/github.com/srenatus/pb-bench/r-1325a0/benchmark_test.go:111 +0x1ea
github.com/srenatus/pb-bench/r-1325a0.BenchmarkRunMsgMarshal(0xc4200fe900)
/Users/stephan/go/src/github.com/srenatus/pb-bench/r-1325a0/benchmark_test.go:25 +0x32
testing.(*B).runN(0xc4200fe900, 0x1)
/usr/local/Cellar/go/1.10/libexec/src/testing/benchmark.go:141 +0xb2
testing.(*B).run1.func1(0xc4200fe900)
/usr/local/Cellar/go/1.10/libexec/src/testing/benchmark.go:214 +0x5a
created by testing.(*B).run1
/usr/local/Cellar/go/1.10/libexec/src/testing/benchmark.go:207 +0x80
exit status 2
FAIL github.com/srenatus/pb-bench/r-1325a0 0.015s
make: *** [bench] Error 1 This was running on this branch. Am I missing something? |
Oh, this kind of error. I've run into this sort of thing before. Typically, the problem was that the two types are being imported from different packages, despite appearing to have the same package, they're actually from different packages (often also weird interactions due to vendored code). |
@puellanivis Thanks, good point. While this line seems relevant, the qualified import there was merely for the sake of clarity (since the path didn't match the package name in it)... Also, changing that line to remove the "rename", the error persists, it's got to be somewhere else 🤔 |
Found the problem, you are still including the api from the
|
Oh dear -- I'm an idiot. Must have had too little coffee back then. Thank you so much 😄 |
Thanks again @puellanivis ✅ this has resolved my issue, sorry for the noise. |
Hi there -- updating our dependency on
golang/protobuf
to v1.1.0, I've noticed an unexpected dip in performance when running some microbenchmarks.👉 I've put up a reproduction case here: https://github.com/srenatus/pb-bench -- you can also look at the output on travis
The gist is that marshaling (of a rather big message) is getting a lot worse in all three metrics; ranging from
+165.77%
(alloc/op) to+751.93%
(allocs/op).I've brought this up in #protobuf of the gopher slack, where @dsnet asked for further input -- here it is (👋 @dsnet 😃).
The text was updated successfully, but these errors were encountered: