Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emscripten and the Producers Section #93

Open
kripken opened this issue Jan 29, 2019 · 11 comments
Open

Emscripten and the Producers Section #93

kripken opened this issue Jan 29, 2019 · 11 comments

Comments

@kripken
Copy link
Member

kripken commented Jan 29, 2019

After a lot of consideration, I don't think we want to change Emscripten to emit the producers section by default in release builds. Posting this issue to note that and explain why.

  • It would be surprising if emcc started one day to emit information like what compiler and tools our users use - most people would probably be surprised by that, and some might be concerned about it, for privacy or security reasons.
  • There is no precedent for this in the Web space: minifiers don't emit a // minified by $MINIFERNAME comment in your minified code. I can't even find a flag to do this optionally in any minifiers.
    • Oddly there is precedent in the native space, as clang and gcc do emit the compiler name and version, but after much digging I never found a good reason for this (just a 20 year old comment about SVR4 compatibility). Most links online about this are users that find out about it and are surprised and sometimes unhappy (about code size, privacy, various bugs that it causes), and many projects end up stripping it out.
  • When users ask emcc to emit the smallest binary, we should do that, unless there's a very strong reason, and I'm not sure users would agree that the metrics we are talking about here are reason enough, based on discussions I had with some of them. The metrics are mostly of interest to browser vendors and tool creators, but it's the users that ship the extra bytes. Many users will probably flip a flag to remove those bytes, if they heard about that flag's existence, which suggests it should be on by default.

These are reasons for specifically not emitting the producers section in emcc by default. We could add an option for users that do want to do so, and of course other tools may have different factors to consider (in particular, Emscripten is used by ordinary developers, while tools like LLVM, wabt, or binaryen are used by toolchain developers, so the considerations might be different).

@lukewagner
Copy link
Member

I can see the logic behind these reasons and I wish they had been brought up in earlier discussions, where I was specifically hoping to hear from tooling people whether the producer section would actually show up in release builds.

With Emscripten setting the precedent of stripping the section for these reasons, I can imagine the convention not being widely adopted which makes me question whether it's worth it to add browser telemetry. At the very least, it seems like this warrants an update to ProducersSection.md to re-set expectations, or maybe just removing ProducersSection.md entirely if there are no other consumers.

kripken added a commit to WebAssembly/binaryen that referenced this issue Jan 31, 2019
WebAssembly/tool-conventions#93 has a summary of emscripten's current thinking on this. For Binaryen, we don't want to do anything to the producers section by default, but do want it to be possible to optionally remove it. To achieve that, this PR

 * creates a --strip-producers pass that removes that section.
 * creates a --strip-debug pass that removes debug info, same as the old --strip, which is still around but deprecated.

A followup in emscripten will use this pass by default.
@jgravelle-google
Copy link

My thoughts on this have essentially flipped from my original: I think the Producers Section is a good thing to have in general, but probably doesn't make sense for Emscripen to emit.

The general principle I'm following here is one of incentives: what pressures exist on the various parties?
Toolchain authors are competing on quality of implementation. Imagine an emcc-but-smaller fork of Emscripten, which provides the same experience to a developer, but ships binaries that are 100 bytes smaller across the board.
Wasm-targeting developers don't inherently care about the ecosystem, and want to deliver the best experience to their users as possible, which means stripping code size. Assume they build with toolchain X, which may or may not include a producers section, at some point they may find it profitable to run a wasm-minifier over the resulting wasm module, which could easily be assumed to strip the producers section rather than add to it.

From the other side of the table though, what benefits can people get from the data provided by the producers section?
Wasm developers (i.e. us) get good metrics on what tools are used where, and can make better decisions as to how to steer the platform, which is good. We aren't in full control of what users ship, however. (We do have a great deal of influence in Emscripten, mind)
The way I see it though, the developers/users who would be most incentivized to annotate this data is from smaller toolchains who want to see how their work spreads.

So it makes sense for toolchains like the Rust compiler to annotate their wasms with producers sections, because being able to definitively say "Rust is used in 10% of the wasm modules on the web today" would be a huge victory. Whereas for establisted toolchains (Emscripten being practically unique in this regard), tracking proliferation of use is less important. I believe the background assumption we're all implicitly making is that Emscripten is used for 99+% of the wasm that's built today, which makes quantifying that less appealing than saving the ~100 bytes.

All that is to say, I don't think precedent matters as much as incentives here, because people will ultimately do what benefits themselves the most anyway. For Emscripten I believe this means to strip it, but for other toolchains I think the visibility incentive is sufficient for this to see use.

(Alternatively, my wild counter-proposal would be to not have any producers section at all, and run analysis based on "this wasm looks like it was built with tool ___" heuristics after-the-fact, which is imprecise and compute-heavy but requires no opt-in. We could always just do both, or use heuristic analysis only on the non-annotated modules, depending on how things actually shake out down the line)

@kripken
Copy link
Member Author

kripken commented Jan 31, 2019

(Alternatively, my wild counter-proposal would be to not have any producers section at all, and run analysis based on "this wasm looks like it was built with tool ___" heuristics after-the-fact, which is imprecise and compute-heavy but requires no opt-in.

I think this is really the way to go, actually. The main benefit is that it would not be a statistically biased sample: with the producers section, size-conscious websites (most of them?) will strip it out, and the remainder may well be different in what tools they use. The analysis approach would actually be sampling the real population of production wasms.

Analysis is not easy, obviously. But it would just need to run on a tiny random sample of the wasms on the Web (so it's not compute-heavy, necessarily; also validating it on that set would be enough). If we all collaborated on this it might not be that hard. I'd volunteer to help if that's relevant.

@aardappel
Copy link

How about.. making the producers section really small (1 LEB per producer) so there is no incentive to strip it for size? :)

@binji
Copy link
Member

binji commented Jan 31, 2019

It's not too hard to eyeball and spot an asm2wasm, wasm backend, rust, go etc. But this won't provide nearly as much info as the producer section will.

@binji
Copy link
Member

binji commented Jan 31, 2019

How about.. making the producers section really small (1 LEB per producer) so there is no incentive to strip it for size? :)

This still has the section overhead...

@jgravelle-google
Copy link

Reducing the incentive doesn't remove it. If I could choose between $100 disappearing out of my bank account vs. $1, I'd still be mad that the bank is losing my money for no reason. I don't mind when my balance goes down when I buy something, however, so the question is how can we get people to feel the cost is worthwhile?

Actually wait, second wild proposal: don't put the bytes inside foo.wasm, publish the section separately as foo.producers. End-users don't need to download the bytes every time, we don't lose any information, there's no incentive to strip it. Only problem is website developers won't feel any pressure to upload that file to their actual sites as part of their build process.

@lukewagner
Copy link
Member

So for ProducersSection.md, I get the impression maybe we shouldn't remove it entirely, but file a PR to:

  1. Rewrite the intro section to just say "If you want to annotate what tools were used to produce this .wasm, here's an interoperable format. Note that tools in the pipeline may strip this section."
  2. Remove the "Known list" since there's not such a point of maintaining a centralized repository anymore.

Yes?

@xtuc
Copy link
Contributor

xtuc commented Feb 4, 2019

Should we discuss about that during the WebAssembly GC meeting tomorrow?

@lukewagner
Copy link
Member

Sorry, I missed this comment in time to add it to the agenda, but yes, that would've made sense.

lukewagner added a commit to WebAssembly/meetings that referenced this issue Feb 5, 2019
binji pushed a commit to WebAssembly/meetings that referenced this issue Feb 5, 2019
@xtuc
Copy link
Contributor

xtuc commented Feb 20, 2019

My impression is that the meeting didn't clarify what we should do here.

I think it's unfortunate, but I agree with @lukewagner's #93 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants