Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anticipate differential updates to applications #99

Closed
jfbastien opened this issue Jun 1, 2015 · 12 comments
Closed

Anticipate differential updates to applications #99

jfbastien opened this issue Jun 1, 2015 · 12 comments
Milestone

Comments

@jfbastien
Copy link
Member

Dynamic linking #53 will help reduce transfer size, but won't fix the differential update problem, e.g.: application fetches wasm.google.com/libm.1.2.so and already has wasm.google.com/libm.1.1.so, and could receive a differential update from 1.1 to 1.2 instead of downloading all of 1.2.

Web Assembly probably shouldn't be the one fixing this issue, maybe HTTP should, but it's a great usecase to move the web platform forward.

EDITS:

  • @kg points out later that differential updates may interact with small binary size. We should collect data on this to avoid regretting design decisions later.
  • @slightlyoff suggests that ServiceWorkers could help with differential updates.

As a reference, asm.js can do this. Trendy Entertainment does this by caching the string to the asm.js module in indexdb and computing their own patches.

It's possible that the Streams API may also help with this.

h/t @kg for raising this issue!

@lukewagner
Copy link
Member

Agreed. In the case of wasm, the patch could be a binary patch, applied to a (binary) Blob stored in IDB/Cache before being script.src = URL.createObjectURL(patchedBlob)'d. Happy to have any PR point this out where relevant in the docs.

@kg
Copy link
Contributor

kg commented Jun 2, 2015

This is one case where I think different performance priorities will be at odds with each other. If you want to be able to efficiently do delta patches, that probably imposes some requirements on how you order functions within the executable, and perhaps even what the bodies of the functions look like in order to make it easy to encode deltas. This sort of optimization would benefit delta patching but potentially throw away some wire size optimizations.

A naive compiler may deterministically produce compact executables that are nearly impossible to perform diffing/deltas on because everything shifts around and moves and is renumbered when the executable changes. In that scenario you have to do a ton of work to be able to do incremental updates. To be fair, x86 has shown that people are willing to do all that work, so maybe that's okay here.

Techniques like using relative indexes (instead of absolute ones) or local index spaces can help, but could also hurt depending on how things work out. For example, let's say you ship a new build of your game that adds a tiny new math routine. The new math routine hits a SIMD opcode you never used before, so your compiler adds that opcode to the opcode table. Whatever policy the compiler uses causes it to reshuffle the opcode table to insert the opcode, and now all the instructions in your executable have changed and your delta is enormous. I think we have a ton of assumptions baked into our format, probably unintentionally, that could make things nasty here in a way that they aren't for javascript.

It would be cool to have a sense of how important incremental updates are and whether we think it's realistic to punt all the problem-solving over to user-space. I suspect we can let userspace do it, but I'm nervous that we'd be throwing away a huge portion of our load-time/decode speed advantages if doing this requires a bunch of complex user-space code, vs some work on our end to make deltas work way better (not implementing delta updates as a whole, mind you).

@jfbastien
Copy link
Member Author

@kg totally agree with your point, this is an unknown and it would be great to have data to understand it better.

@kg
Copy link
Contributor

kg commented Jun 3, 2015

I will try to do some investigation into this, but it might be worthwhile to reach out to people who have done differential updates on large production applications. I think Chrome has a rather elaborate solution for this? I don't know anything about it, though.

EDIT: And obviously, we want to be better than applying diffs/patches to JS text... if we're not careful we can easily be 5-10x worse than that.

@lukewagner
Copy link
Member

Ah, I see, so you're saying that if, e.g., the new version introduces a single function in the middle, it'll end up renumbering all the function indices. That makes sense.

Applying our general layering strategy would suggest solving this problem at a higher layer (initially in user-space, maybe one day specified) since all the schemes I can think of to mitigate this at the bottom layer would add complexity. Remember that if we have content-keyed compiled code caching (something we'd need anyway for diffing to have fast warm-start time), we can cache/diff at any level before lowering to the binary encoding that is fed to the wasm decoder.

@kg
Copy link
Contributor

kg commented Jun 3, 2015

Yeah, I think we want to solve this through layering. I think it implies that we will potentially need 'smart compilers' or a smart tool in the compile toolchain that reorders executables to make them easier to apply deltas to, as one hypothetical example. Or it means that things like the instruction tables and function tables are sorted deterministically, etc. None of this needs to be specced, I think, but we want to make sure the spec doesn't rule any of these techniques out. Some of our earlier spec drafts do poorly on this criteria, and my prototyping was ignorant of this so I was experimenting with things that would totally degrade delta updates. Our push to simplify will help us avoid any landmines here for sure.

@jfbastien
Copy link
Member Author

Chrome uses Courgette to do small differential updates, and the Syzygy folks may also have insights.

I'm OK with leaving this issue open for now, and volunteer to contact them after the public release to solicit input. I don't think it blocks going public, but it is important to consider early as @kg pointed out.

@jfbastien jfbastien added this to the v.1 milestone Jun 3, 2015
@jfbastien
Copy link
Member Author

@slightlyoff suggests that ServiceWorkers could help with differential updates. Updating the bug description accordingly.

@kg
Copy link
Contributor

kg commented Jun 4, 2015

I'm going to try and gather some data on this during my experiments by making it possible to toggle each of my optimizations on/off. I'll come up with one or two test scenarios with a base application + update so I can measure the resulting differences using some off-the-shelf patch/diff tools and hopefully that will give us enough information to decide how much this matters while we're planning the MVP.

jfbastien added a commit that referenced this issue Jun 9, 2015
This refactoring clarifies points made in #53, #99, #81, and overall tries to make the text more self-coherent, less bullet-pointy.
@luser
Copy link

luser commented Jun 19, 2015

For reference, Mozilla uses bsdiff for partial updates. It does a pretty good job but it doesn't do so well with our PGO-compiled binaries. Testing it against some wasm prototype data would be nice.

@slightlyoff
Copy link

bsdiff is pretty poor on sectioned binaries. Chrome winds up using
something called courgette instead:
https://www.chromium.org/developers/design-documents/software-updates-courgette

Would love to see data against that as well.

On Fri, Jun 19, 2015 at 8:03 AM, Ted Mielczarek notifications@github.com
wrote:

For reference, Mozilla uses bsdiff http://www.daemonology.net/bsdiff/
for partial updates. It does a pretty good job but it doesn't do so well
with our PGO-compiled binaries. Testing it against some wasm prototype data
would be nice.


Reply to this email directly or view it on GitHub
#99 (comment).

@jfbastien
Copy link
Member Author

We have dynamic linking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants