Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize DecodeString and Decode methods. #1

Merged
merged 3 commits into from
Nov 22, 2016

Conversation

kevina
Copy link
Contributor

@kevina kevina commented Nov 18, 2016

In the DecodeString case it avoids unnecessary intermediate strings.
In the Decode case it just avoids bytes.Map, which is likely slow due
to having to make a function call for each character.

In the benchmark in ipfs/kubo#3376 is was able to save about 40 ms of cpu time.

In the DecodeString case it avoids unnecessary intermediate strings.
In the Decode case it just avoids bytes.Map, which is likely slow due
to having to make a function call for each character.
@kevina
Copy link
Contributor Author

kevina commented Nov 18, 2016

Note: I manually ran go test ./... and everything passed.

n, _, err = enc.decode(dst, src)
func (enc *Encoding) Decode(dst, s []byte) (n int, err error) {
stripped := make([]byte, 0, len(s))
for i := 0; i < len(s); i++ {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use for _, c := range s {. Its faster because it optimizes out bounds checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can fix this one. But see below.

return
}

// DecodeString returns the bytes represented by the base32 string s.
func (enc *Encoding) DecodeString(s string) ([]byte, error) {
s = strings.Map(removeNewlinesMapper, s)
stripped := make([]byte, 0, len(s))
for i := 0; i < len(s); i++ {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is it returns a rune and not a byte when using range for a string. Is there a way to do use range on the raw bytes or a string.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would have to basically cast it to a []byte first. At which point you might as well just call Decode([]byte(s)) in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. So I was trying to avoid any necessary allocation so is this one okay as is?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, one thing you could do is convert the string to bytes, then overwrite that same array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do that. It will require an extra pass and seams like an overkill. Eventually the go compiler should be smart enough to eliminate the bounds checks. Most other optimizing compilers will in simple cases like this.

Do you really think it could make that much of a difference? If so I will benchmark it and let you know.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wont take an extra pass:

strb := []byte(s)
off := 0
for i, b := range strb {
  if b == '\n' || b == '\r' {
    continue
  }
  strb[off] = b
  off++
}

n, _, err := enc.decode(strb, strb[:off])
if err != nil {
  return nil, err
}
return strb[:n], nil

Copy link
Contributor Author

@kevina kevina Nov 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will. The extra pass is in the conversion from the string to a []byte.

I think this is just adding complexity but since you seam to really want it I will run some benchmarks and report back. It may be tomorrow through before I do it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converting a string to a byte array is pretty well optimized. If youre going to count that as an extra pass then you have to count allocating a byte buffer as an extra pass as go zeros every buffer you allocate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said, I will run some benchmarks. In many cases you can allocate zeroed memory without require a pass to zero it.

@kevina
Copy link
Contributor Author

kevina commented Nov 19, 2016

@whyrusleeping Okay, I used "range" in Decode() but left DecodeString() alone.

If you are still not happy with this i can run some more tests to see if the index checking is really costing anything and if it would be better to convert to a []byte first.

Thanks

kevina added a commit to kevina/base32 that referenced this pull request Nov 19, 2016
@kevina
Copy link
Contributor Author

kevina commented Nov 19, 2016

@whyrusleeping I ran some benchmarks. It turns out the index check is either optimized out or otherwise does not matter. Your version is faster because it only allocates once instead of twice (something that I missed yesterday, sorry), not because it avoids the index check. It also turns out that for i,c := range []byte(s) {} is optimized to avoid allocating an array (see https://github.com/golang/go/wiki/CompilerOptimizations#range-over-bytes ) Here are some numbers from the benchmark branch at https://github.com/kevina/base32 (commit 84ef217)

$ go test -bench 'BenchmarkKeyDecode.*'
BenchmarkKeyDecodeKevOrig-4      3000000               513 ns/op
BenchmarkKeyDecodeKevNew-4       3000000               449 ns/op
BenchmarkKeyDecodeOpt-4          3000000               452 ns/op
BenchmarkKeyDecodeDeOpt-4        3000000               553 ns/op
BenchmarkKeyDecodeWhy-4          3000000               470 ns/op
BenchmarkKeyDecodeFinal-4        3000000               482 ns/op

KevOrig is my orignal version, KevNew avoids an extra allocation, 'Opt' takes advantage of the for i,c := range []byte(s) {}optimization. DeOpt uses b := byte(s); for i,c := range b {} instead. Why is your original version. Final is a slightly refactored version of yours.

Do this by calling the internal decode using the same byte array for
the source and dest.
@kevina
Copy link
Contributor Author

kevina commented Nov 19, 2016

I just pushed a commit that uses the version of DecodeString uses by the DecodeFinal benchmark tests.

I'm fine if we just go with that (basically your version). The time between the different versions (that only use a single alloc) is very small and various by 20ns each time I run it.

kevina added a commit to kevina/base32 that referenced this pull request Nov 21, 2016
@kevina
Copy link
Contributor Author

kevina commented Nov 21, 2016

For compassion I added the original implementation in the benchmarking code (a5bba06). Here are the numbers from another run

BenchmarkKeyDecodeOrig-4         2000000               892 ns/op
BenchmarkKeyDecodeKevOrig-4      3000000               498 ns/op
BenchmarkKeyDecodeKevNew-4       3000000               434 ns/op
BenchmarkKeyDecodeOpt-4          3000000               433 ns/op
BenchmarkKeyDecodeDeOpt-4        3000000               528 ns/op
BenchmarkKeyDecodeWhy-4          3000000               462 ns/op
BenchmarkKeyDecodeFinal-4        3000000               464 ns/op

Orig is the original implementation.

@whyrusleeping
Copy link
Owner

This LGTM, thanks @kevina!

@whyrusleeping whyrusleeping merged commit 38ece81 into whyrusleeping:master Nov 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants