Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only link res_init() on GNU/*nix #47334

Merged
merged 1 commit into from
Jan 22, 2018
Merged

Only link res_init() on GNU/*nix #47334

merged 1 commit into from
Jan 22, 2018

Conversation

etaoins
Copy link
Contributor

@etaoins etaoins commented Jan 10, 2018

To workaround a bug in glibc <= 2.26 lookup_host() calls res_init() based on the glibc version detected at runtime. While this avoids calling res_init() on platforms where it's not required we will still end up linking against the symbol.

This causes an issue on macOS where res_init() is implemented in a separate library (libresolv.9.dylib) from the main libc. While this is harmless for standalone programs it becomes a problem if Rust code is statically linked against another program. If the linked program doesn't already specify -lresolv it will cause the link to fail. This is captured in issue #46797

Fix this by hooking in to the glibc workaround in cvt_gai and only activating it for the "gnu" environment on Unix This should include all glibc platforms while excluding musl, windows-gnu, macOS, FreeBSD, etc.

This has the side benefit of removing the #[cfg] in sys_common; only unix.rs has code related to the workaround now.

Before this commit:

> cat main.rs 
use std::net::ToSocketAddrs;

#[no_mangle]
pub extern "C" fn resolve_test() -> () {
    let addr_list = ("google.com.au", 0).to_socket_addrs().unwrap();
    println!("{:?}", addr_list);
}
> rustc --crate-type=staticlib main.rs 
> clang libmain.a test.c -o combined
Undefined symbols for architecture x86_64:
  "_res_9_init", referenced from:
      std::net::lookup_host::h93c17fe9ad38464a in libmain.a(std-826c8d3b356e180c.std0.rcgu.o)
ld: symbol(s) not found for architecture x86_64
clang-5.0: error: linker command failed with exit code 1 (use -v to see invocation)

Afterwards:

> rustc --crate-type=staticlib main.rs 
> clang libmain.a test.c -o combined
> ./combined 
IntoIter([V4(172.217.25.131:0)])

Fixes #46797

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@etaoins
Copy link
Contributor Author

etaoins commented Jan 10, 2018

I'm not sure if this is the right approach; just throwing it out there to get ideas. I'm especially interested in feedback from @jonhoo and @dtolnay as they were involved in #44965

@jonhoo
Copy link
Contributor

jonhoo commented Jan 10, 2018

This seems like a perfectly reasonable fix to me.

@alexcrichton
Copy link
Member

Thanks for the PR! Could this perhaps be refactored a little differently though to avoid #[cfg] in the sys_common directory? Basically delegate to a function which is constant on most platforms, but on gnu/linux it is filled out?

@jonhoo
Copy link
Contributor

jonhoo commented Jan 10, 2018

@alexcrichton the code that's there currently already uses #[cfg], so that'd make this a more major change than it needs to be. Though perhaps its worthwhile to clean it up if we're modifying this code anyway. Is there a strong reason to move it? There are already several other #[cfg] items in that file.

If we do factor out the refresh into its own file, I propose adding

fn on_resolver_failure(e: io::Error) -> io::Error { e }

to each of the sys::nets (well, except gnu/linux), and then

use sys::net::on_resolver_failure;
// ...
match ... {
    Ok(..) => ...,
    Err(e) => ::sys::net::on_resolver_failure(e),
}

@alexcrichton
Copy link
Member

Yeah I just find it's personally best to cut down on this sort of cfg traffic is possible, and what you've suggested should do the trick!

@etaoins
Copy link
Contributor Author

etaoins commented Jan 10, 2018

@alexcrichton @jonhoo

From what I can see that would only remove the #cfg from sys_common/net.rs. sys/unix/net.rs is shared between all Unix-likes so it would still need #cfg to detect GNU. Am I understanding correctly? Happy to make the change if so.

@jonhoo
Copy link
Contributor

jonhoo commented Jan 10, 2018

@etaoins yes, I think the concern is specifically with #[cfg] in sys_common, as it is supposed to be, well, common :)

@alexcrichton
Copy link
Member

Ah yeah this'd just be moving the cfg, we'd still have it for sure in the unix/net.rs file

@etaoins
Copy link
Contributor Author

etaoins commented Jan 11, 2018

@jonhoo @alexcrichton Updated with on_resolver_failure

if ret != 0 {
return Err(io::Error::last_os_error());
}
unsafe { libc::res_init() };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason why you chose to prefer the original resolver error over any potential res_init error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this but:

  1. That's what the code in master is doing. The error from res_init_if_glibc_before_2_26 is discarded and the original error from getaddrinfo() is returned.

  2. Callers are probably expecting getaddrinfo()-type errors from lookup_host() (e.g. EAI_FAIL) . This is especially true since this will only be happening on an increasingly small fraction of platforms in obscure cases so it won't get much testing. I think returning the res_init() result here would be surprising.

  3. res_init() does not return a well defined set of error codes. It's not clear that's it's setting errno at all from reading the glibc manage

  4. I'm not convinced that res_init() will always be the most proximate error. It's possible that res_init() is returning a useful report on why the resolution system isn't working (but I have my doubts due to point 3). However, it's also possible that the getaddrinfo() is returning more useful information and the res_init() is failing for some inconsequential reason.

I kept the error code return from on_resolver_failure in case some platform wants to modify the error on return but it's just directly returned at the moment.

If there's a strong opinion in the other direction I can change the error behaviour but I want to reiterate that would be a functional change from master.

@alexcrichton
Copy link
Member

@bors: r+

Thanks @etaoins!

@bors
Copy link
Contributor

bors commented Jan 11, 2018

📌 Commit 6289732 has been approved by alexcrichton

@kennytm
Copy link
Member

kennytm commented Jan 12, 2018

@bors r-

Compilation failed on Windows.

error[E0425]: cannot find function `on_resolver_failure` in module `sys::net`
   --> libstd\sys_common\net.rs:174:33
    |
174 |                 Err(::sys::net::on_resolver_failure(e))
    |                                 ^^^^^^^^^^^^^^^^^^^ not found in `sys::net`

You need to define on_resolver_failure in src\libstd\sys\windows\net.rs too. I'm not sure if you need to distinguish between #[cfg(target_env = "gnu")] and "msvc" here.

@kennytm kennytm added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jan 12, 2018
@jonhoo
Copy link
Contributor

jonhoo commented Jan 12, 2018

I don't think distinguishing on Windows should be needed. @etaoins make sure you define the on_resolver_failure in every platform under sys :)

@etaoins
Copy link
Contributor Author

etaoins commented Jan 12, 2018

@kennytm
windows-gnu refers to the ABI. Everything still ends up using Winsock on Windows so the glibc workaround doesn't apply.

@jonhoo
I added the Windows hook. I believe everything else was already covered.

Unrelated to Windows but it seems unfortunate that every platform needs to define on_resolver_failure() even if they're not using the lookup_host() implementation in sys_common.

@etaoins etaoins changed the title Only call res_init() on GNU/*nix Only link res_init() on GNU/*nix Jan 12, 2018
bors added a commit that referenced this pull request Jan 12, 2018
@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Jan 12, 2018

📌 Commit de391b8 has been approved by alexcrichton

@kennytm kennytm added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 13, 2018
@kennytm
Copy link
Member

kennytm commented Jan 15, 2018

@bors r=alexcrichton

There's no simple way to trigger a full rebuild without merging, but since this has broken rollup twice I'd no longer consider this when creating a rollup 😃

@bors
Copy link
Contributor

bors commented Jan 15, 2018

📌 Commit 5b6aa8b has been approved by alexcrichton

@kennytm
Copy link
Member

kennytm commented Jan 15, 2018

@bors r-

@etaoins Er please use git rebase to remove the merge commit ad3b2621d2712476e6a7e80a123033b52adfbf07.

@jonhoo
Copy link
Contributor

jonhoo commented Jan 15, 2018

Hmm.. I don't think 5b6aa8bac94a12329ef78e7c1636a4c197fbb0db will work. For the wasm change, removing the function will (presumably) make linking fail? I think you'd instead have to annotate it with #[allow(dead_code)].

To workaround a bug in glibc <= 2.26 lookup_host() calls res_init()
based on the glibc version detected at runtime. While this avoids
calling res_init() on platforms where it's not required we will still
end up linking against the symbol.

This causes an issue on macOS where res_init() is implemented in a
separate library (libresolv.9.dylib) from the main libc. While this is
harmless for standalone programs it becomes a problem if Rust code is
statically linked against another program. If the linked program doesn't
already specify -lresolv it will cause the link to fail. This is
captured in issue #46797

Fix this by hooking in to the glibc workaround in `cvt_gai` and only
activating it for the "gnu" environment on Unix This should include all
glibc platforms while excluding musl, windows-gnu, macOS, FreeBSD, etc.

This has the side benefit of removing the #[cfg] in sys_common; only
unix.rs has code related to the workaround now.
@etaoins
Copy link
Contributor Author

etaoins commented Jan 15, 2018

@jonhoo @alexcrichton
As you can see the strategy of adding a platform on_resolver_failure hook was causing endless build troubles (sorry @kennytm!) and scattered code. I ended up redoing this PR to take a different approach.

sys_common/net.rs already calls cvt_gai to convert the result of getaddrinfo in to an io::Error. We can use that as a point to call in to our glibc workaround. This isolates all of the #[cfg] logic to unix.rs and leaves the other platforms alone. This has been tested on Linux and macOS. My one concern is that cvt_gai doesn't sound like it should have side effects - maybe it should be renamed something like process_gai_result?

Let me know what you think

@alexcrichton
Copy link
Member

@bors: r+

Looks good to me!

@bors
Copy link
Contributor

bors commented Jan 16, 2018

📌 Commit 090a968 has been approved by alexcrichton

@kennytm kennytm added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 16, 2018
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jan 19, 2018
…nix, r=alexcrichton

Only link res_init() on GNU/*nix

To workaround a bug in glibc <= 2.26 lookup_host() calls res_init() based on the glibc version detected at runtime. While this avoids calling res_init() on platforms where it's not required we will still end up linking against the symbol.

This causes an issue on macOS where res_init() is implemented in a separate library (libresolv.9.dylib) from the main libc. While this is harmless for standalone programs it becomes a problem if Rust code is statically linked against another program. If the linked program doesn't already specify -lresolv it will cause the link to fail. This is captured in issue rust-lang#46797

Fix this by hooking in to the glibc workaround in `cvt_gai` and only activating it for the "gnu" environment on Unix This should include all glibc platforms while excluding musl, windows-gnu, macOS, FreeBSD, etc.

This has the side benefit of removing the #[cfg] in sys_common; only unix.rs has code related to the workaround now.

Before this commit:
```shell
> cat main.rs
use std::net::ToSocketAddrs;

#[no_mangle]
pub extern "C" fn resolve_test() -> () {
    let addr_list = ("google.com.au", 0).to_socket_addrs().unwrap();
    println!("{:?}", addr_list);
}
> rustc --crate-type=staticlib main.rs
> clang libmain.a test.c -o combined
Undefined symbols for architecture x86_64:
  "_res_9_init", referenced from:
      std::net::lookup_host::h93c17fe9ad38464a in libmain.a(std-826c8d3b356e180c.std0.rcgu.o)
ld: symbol(s) not found for architecture x86_64
clang-5.0: error: linker command failed with exit code 1 (use -v to see invocation)
```

Afterwards:
```shell
> rustc --crate-type=staticlib main.rs
> clang libmain.a test.c -o combined
> ./combined
IntoIter([V4(172.217.25.131:0)])
```

Fixes  rust-lang#46797
bors added a commit that referenced this pull request Jan 19, 2018
Rollup of 8 pull requests

- Successful merges: #46938, #47334, #47420, #47508, #47510, #47512, #47535, #47559
- Failed merges:
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jan 21, 2018
…nix, r=alexcrichton

Only link res_init() on GNU/*nix

To workaround a bug in glibc <= 2.26 lookup_host() calls res_init() based on the glibc version detected at runtime. While this avoids calling res_init() on platforms where it's not required we will still end up linking against the symbol.

This causes an issue on macOS where res_init() is implemented in a separate library (libresolv.9.dylib) from the main libc. While this is harmless for standalone programs it becomes a problem if Rust code is statically linked against another program. If the linked program doesn't already specify -lresolv it will cause the link to fail. This is captured in issue rust-lang#46797

Fix this by hooking in to the glibc workaround in `cvt_gai` and only activating it for the "gnu" environment on Unix This should include all glibc platforms while excluding musl, windows-gnu, macOS, FreeBSD, etc.

This has the side benefit of removing the #[cfg] in sys_common; only unix.rs has code related to the workaround now.

Before this commit:
```shell
> cat main.rs
use std::net::ToSocketAddrs;

#[no_mangle]
pub extern "C" fn resolve_test() -> () {
    let addr_list = ("google.com.au", 0).to_socket_addrs().unwrap();
    println!("{:?}", addr_list);
}
> rustc --crate-type=staticlib main.rs
> clang libmain.a test.c -o combined
Undefined symbols for architecture x86_64:
  "_res_9_init", referenced from:
      std::net::lookup_host::h93c17fe9ad38464a in libmain.a(std-826c8d3b356e180c.std0.rcgu.o)
ld: symbol(s) not found for architecture x86_64
clang-5.0: error: linker command failed with exit code 1 (use -v to see invocation)
```

Afterwards:
```shell
> rustc --crate-type=staticlib main.rs
> clang libmain.a test.c -o combined
> ./combined
IntoIter([V4(172.217.25.131:0)])
```

Fixes  rust-lang#46797
bors added a commit that referenced this pull request Jan 21, 2018
Rollup of 9 pull requests

- Successful merges: #47247, #47334, #47512, #47582, #47595, #47625, #47632, #47633, #47637
- Failed merges:
@bors bors merged commit 090a968 into rust-lang:master Jan 22, 2018
@jonhoo
Copy link
Contributor

jonhoo commented Jan 22, 2018

@etaoins 🎉

@oconnor663
Copy link
Contributor

I'm late to the party, but I'm curious whether anyone tried using weak! linkage to call res_init, since we're already using it anyway to call gnu_get_libc_version. The target_env = "gnu" fix seems simpler, but in case that breaks in the future for whatever reason, maybe we have a second option.

oconnor663 added a commit to oconnor663/rust that referenced this pull request Aug 27, 2018
This typo was introduced in rust-lang#47334.
A couple tests bitrotted as a result, so we fix those too, and move them
to a more sensible place.
kennytm added a commit to kennytm/rust that referenced this pull request Aug 28, 2018
fix a typo: taget_env -> target_env

This typo was introduced in rust-lang#47334. A couple tests bitrotted as a result, so we fix those too, and move them to a more sensible place.

Is there some lint we could turn on that would've caught this? It's a drag that cfg typos can silently pass through the compiler.
Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this pull request Aug 28, 2018
fix a typo: taget_env -> target_env

This typo was introduced in rust-lang#47334. A couple tests bitrotted as a result, so we fix those too, and move them to a more sensible place.

Is there some lint we could turn on that would've caught this? It's a drag that cfg typos can silently pass through the compiler.
pietroalbini added a commit to pietroalbini/rust that referenced this pull request Aug 29, 2018
fix a typo: taget_env -> target_env

This typo was introduced in rust-lang#47334. A couple tests bitrotted as a result, so we fix those too, and move them to a more sensible place.

Is there some lint we could turn on that would've caught this? It's a drag that cfg typos can silently pass through the compiler.
pietroalbini added a commit to pietroalbini/rust that referenced this pull request Aug 29, 2018
fix a typo: taget_env -> target_env

This typo was introduced in rust-lang#47334. A couple tests bitrotted as a result, so we fix those too, and move them to a more sensible place.

Is there some lint we could turn on that would've caught this? It's a drag that cfg typos can silently pass through the compiler.
pietroalbini added a commit to pietroalbini/rust that referenced this pull request Aug 30, 2018
fix a typo: taget_env -> target_env

This typo was introduced in rust-lang#47334. A couple tests bitrotted as a result, so we fix those too, and move them to a more sensible place.

Is there some lint we could turn on that would've caught this? It's a drag that cfg typos can silently pass through the compiler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

macOS executables unnecessarily depend on libresolv.dylib
7 participants