Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

impl Index<RangeFrom> for CStr #74021

Merged
merged 2 commits into from
Jul 19, 2020
Merged

impl Index<RangeFrom> for CStr #74021

merged 2 commits into from
Jul 19, 2020

Conversation

1011X
Copy link
Contributor

@1011X 1011X commented Jul 4, 2020

This change implements (partial) slicing for CStr.

Since a CStr must end in a null byte, it's not possible to trim from the right and still have a valid CStr. But, it is possible to trim from the left. This lets us be a bit more flexible and treat them more like strings.

let string = CStr::from_bytes_with_nul(b"Hello World!\0");
let result = CStr::from_bytes_with_nul(b"World!\0");
assert_eq!(&string[6..], result);

@rust-highfive
Copy link
Collaborator

r? @hanna-kruppe

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 4, 2020
@jonas-schievink jonas-schievink added needs-fcp This change is insta-stable, so needs a completed FCP to proceed. relnotes Marks issues that should be documented in the release notes of the next release. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jul 4, 2020
@jonas-schievink jonas-schievink added this to the 1.46 milestone Jul 4, 2020
@hanna-kruppe
Copy link
Contributor

What's your use case for this method?

I'm a bit hesitant to add this API because its performance is impacted by the theoretically-planned change from representing &CStr as (ptr, len) pair to representing it like a plain char * (in C terms) with length only available by calling strlen. The same is true for to_bytes[_with_nul] but that's a rather more central operation and its documentation warns about the potential performance pitfall. Adding more APIs with the same pitfall -- especially a trait impl whose docs are hidden by default -- seems to increase the risk that code becomes accidentally quadratic if the planned change ever does happen.

@hanna-kruppe
Copy link
Contributor

In any case, cc @rust-lang/libs because this would be insta-stable and hence needs an FCP.

@hanna-kruppe hanna-kruppe reopened this Jul 4, 2020
@Amanieu
Copy link
Member

Amanieu commented Jul 4, 2020

I think this is very useful functionality!

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Jul 4, 2020

Team member @Amanieu has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Jul 4, 2020
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
@1011X
Copy link
Contributor Author

1011X commented Jul 5, 2020

@hanna-kruppe Sorry for taking so long to reply. I don't have a use-case, just thought it'd be nice to have, since it's currently kinda difficult to get CStrs from other CStrs.

I see the desire to treat &CStr the same as a *const c_char, since it means you can just use &CStr in FFI code to seemlessly handle C strings between both C and Rust, while also allowing some conveniences like, e.g., adding methods to it.

But tbh I'm not sure how I feel about that, since this breaks a couple of assumptions I have about references:

  • a reference is semantically different from a pointer
  • a reference to an unsized type always has a length

Say we have a C function that returns a char const * and we want to write a Rust wrapper for it. We could use &CStr, but then what happens if the function returns a null pointer? It's a perfectly valid value to return for a char const *, but now it seems Rust would run into undefined behavior.

For this feature, the only way I can think of that still allows for it without making the potential changes you mentioned more difficult is something like this:

// assume `self` is the same as `*const c_char`
let len = sys::strlen(self);
if index.start < len {
    self.offset(index.start)
} else {
    // emulate an out-of-bounds panic
    panic!("index out of bounds: the len is {} but the index is {}", len, index.start");
}

But of course this means that it'll need to calculate the length every time to make sure it's not going over the end, rather than using a cached length.

src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Jul 6, 2020

@hanna-kruppe Sorry for taking so long to reply. I don't have a use-case, just thought it'd be nice to have, since it's currently kinda difficult to get CStrs from other CStrs.

I see the desire to treat &CStr the same as a *const c_char, since it means you can just use &CStr in FFI code to seemlessly handle C strings between both C and Rust, while also allowing some conveniences like, e.g., adding methods to it.

But tbh I'm not sure how I feel about that, since this breaks a couple of assumptions I have about references:

* a reference is semantically different from a pointer

* a reference to an unsized type always has a length

Frankly, I also have my doubts about the usefulness of that change at this point, but for different reasons. The way dynamically-sized types (DST) in Rust work, it's a bit sloppy to talk about &CStr specifically because all other pointers-to-CStr would work the same way, including raw pointers. And there's a more general long-standing desire to generalize DSTs from "always pointer + length or pointer + vtable" to allow almost arbitrary metadata bundled to the pointer, including no metadata at all. For example, to store the vtable pointer inside (or next to) the object (as in C++) -- such a trait object is still unsized, but the pointer to it is thin.

And yet, generalized DSTs would still have a baked-in assumption that the size is rather cheap to fetch, which would not be true for "thin CStr". And while non-null-ness of references is not necessarily an obstacle to FFI (you can just use Option<&CStr> if null is allowed), the aliasing implications of references are sometimes subtly inappropriate when interacting with C. Furthermore, many uses of C strings do benefit from a cached length, and maintaining that length manually is error-prone and unsafe. Thus, I believe there will always be room for a "C string + cached length" type (what CStr currently is) even if we add a newtype for "just the pointer" in the future -- and in that case, we might as well let CStr remain the "cached strlen" type and call the "just the pointer" type something different.

Still, despite my misgivings, the plan of record is to to make CStr lose the length (or at least keep open the possibility), so my preferred options would be giving up on those plans or not adding APIs that conflict with those plans.

Say we have a C function that returns a char const * and we want to write a Rust wrapper for it. We could use &CStr, but then what happens if the function returns a null pointer? It's a perfectly valid value to return for a char const *, but now it seems Rust would run into undefined behavior.

For this feature, the only way I can think of that still allows for it without making the potential changes you mentioned more difficult is something like this:

// assume `self` is the same as `*const c_char`
let len = sys::strlen(self);
if index.start < len {
    self.offset(index.start)
} else {
    // emulate an out-of-bounds panic
    panic!("index out of bounds: the len is {} but the index is {}", len, index.start");
}

But of course this means that it'll need to calculate the length every time to make sure it's not going over the end, rather than using a cached length.

Right, something like that. An improvement is possible when index.start is much smaller than strlen(self) (only check that there's no 0 byte before index.start, stop searching beyond that) though it's possible that this is a regression when index.start is close to the end of the string, if strlen is more efficient than strnlen_s or its open-coded equivalent.

@1011X
Copy link
Contributor Author

1011X commented Jul 13, 2020

are there any other changes needed on my part? Github is showing a red "changes requested" from @dtolnay, but it's already been fixed. also @RalfJung's review doesn't show the "outdated" mark like the others. is there something else i need to do?

@dtolnay
Copy link
Member

dtolnay commented Jul 13, 2020

I think this is waiting on one other libs member to sign off in #74021 (comment) or to raise objections.

@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Jul 14, 2020
@rfcbot
Copy link

rfcbot commented Jul 14, 2020

🔔 This is now entering its final comment period, as per the review above. 🔔

@Dylan-DPC-zz Dylan-DPC-zz modified the milestones: 1.46, 1.47 Jul 18, 2020
@dtolnay
Copy link
Member

dtolnay commented Jul 18, 2020

@bors r+

@bors
Copy link
Contributor

bors commented Jul 18, 2020

📌 Commit 30b8835 has been approved by dtolnay

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 18, 2020
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 19, 2020
…arth

Rollup of 7 pull requests

Successful merges:

 - rust-lang#70817 (Add core::task::ready! macro)
 - rust-lang#73762 (Document the trait keyword)
 - rust-lang#74021 (impl Index<RangeFrom> for CStr)
 - rust-lang#74071 (rustc_metadata: Make crate loading fully speculative)
 - rust-lang#74445 (add test for rust-lang#62878)
 - rust-lang#74459 (Make unreachable_unchecked a const fn)
 - rust-lang#74478 (Revert "Use an UTF-8 locale for the linker.")

Failed merges:

r? @ghost
@bors bors merged commit f305b20 into rust-lang:master Jul 19, 2020
@Mark-Simulacrum
Copy link
Member

@rust-timer make-pr-for f305b20

rust-timer added a commit to rust-timer/rust that referenced this pull request Jul 22, 2020
Original message:
Rollup merge of rust-lang#74021 - 1011X:master, r=dtolnay

impl Index<RangeFrom> for CStr

This change implements (partial) slicing for `CStr`.

Since a `CStr` must end in a null byte, it's not possible to trim from the right and still have a valid `CStr`. But, it *is* possible to trim from the left. This lets us be a bit more flexible and treat them more like strings.

```rust
let string = CStr::from_bytes_with_nul(b"Hello World!\0");
let result = CStr::from_bytes_with_nul(b"World!\0");
assert_eq!(&string[6..], result);
```
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Jul 24, 2020
@BurntSushi
Copy link
Member

@rust-lang/libs Did we consider the impact of this change on the possible future implementation of choice of not caching the length of the C string, as mentioned by @hanna-kruppe above? In particular, with this API stabilized, I feel it is very unlikely that we would ever stop caching the length.

Note that this is about to be on the stable release in a couple days.

@dtolnay
Copy link
Member

dtolnay commented Oct 6, 2020

I considered it. I don't think this is incompatible with using a ptr-only representation for &CStr. Indexing with n.. will verify no \0 in the first n bytes and produce a &CStr beginning n bytes higher.

@BurntSushi
Copy link
Member

BurntSushi commented Oct 6, 2020

@dtolnay Interesting. I think that would be our first impl of Index in std that is not constant time? That's what I was thinking would make it very unlikely to stop caching its length. (Not that it's necessarily incompatible.)

@Mark-Simulacrum
Copy link
Member

BTreeMap Index is log n, at least, though that's "better" than n

@pietroalbini
Copy link
Member

Release is tomorrow, we can back the stabilization out for a release if that gives y'all more time to think.

@BurntSushi
Copy link
Member

BurntSushi commented Oct 7, 2020

cc @rust-lang/libs Does anyone want to second reconsidering this? If it's just me then happy to let it rest. We should at least carefully document this. Because it's going to be a potentially surprising performance footgun if we ever do decide to stop caching the length.

@sfackler
Copy link
Member

sfackler commented Oct 7, 2020

I also feel a bit uneasy.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Nov 13, 2020
Pkgsrc changes:
 * Remove patches now integrated upstream, many related to SunOS / Illumos.
 * The LLVM fix for powerpc is also now integrated upstream.
 * Adapt those patches where the source has moved or parts are integrated.
 * The randomness patches no longer applies, and I could not find
   where those files went...
 * Provide a separate bootstrap for NetBSD/powerpc 9.0, since apparently
   the C++ ABI is different from 8.0.  Yes, this appears to be specific to
   the NetBSD powerpc ports.

Upstream changes:

Version 1.47.0 (2020-10-08)
==========================

Language
--------
- [Closures will now warn when not used.][74869]

Compiler
--------
- [Stabilized the `-C control-flow-guard` codegen option][73893], which enables
  [Control Flow Guard][1.47.0-cfg] for Windows platforms, and is ignored on
  other platforms.
- [Upgraded to LLVM 11.][73526]
- [Added tier 3\* support for the `thumbv4t-none-eabi` target.][74419]
- [Upgrade the FreeBSD toolchain to version 11.4][75204]
- [`RUST_BACKTRACE`'s output is now more compact.][75048]

\* Refer to Rust's [platform support page][forge-platform-support] for more
information on Rust's tiered platform support.

Libraries
---------
- [`CStr` now implements `Index<RangeFrom<usize>>`.][74021]
- [Traits in `std`/`core` are now implemented for arrays of any length, not just
  those of length less than 33.][74060]
- [`ops::RangeFull` and `ops::Range` now implement Default.][73197]
- [`panic::Location` now implements `Copy`, `Clone`, `Eq`, `Hash`, `Ord`,
  `PartialEq`, and `PartialOrd`.][73583]

Stabilized APIs
---------------
- [`Ident::new_raw`]
- [`Range::is_empty`]
- [`RangeInclusive::is_empty`]
- [`Result::as_deref`]
- [`Result::as_deref_mut`]
- [`Vec::leak`]
- [`pointer::offset_from`]
- [`f32::TAU`]
- [`f64::TAU`]

The following previously stable APIs have now been made const.

- [The `new` method for all `NonZero` integers.][73858]
- [The `checked_add`,`checked_sub`,`checked_mul`,`checked_neg`, `checked_shl`,
  `checked_shr`, `saturating_add`, `saturating_sub`, and `saturating_mul`
  methods for all integers.][73858]
- [The `checked_abs`, `saturating_abs`, `saturating_neg`, and `signum`  for all
  signed integers.][73858]
- [The `is_ascii_alphabetic`, `is_ascii_uppercase`, `is_ascii_lowercase`,
  `is_ascii_alphanumeric`, `is_ascii_digit`, `is_ascii_hexdigit`,
  `is_ascii_punctuation`, `is_ascii_graphic`, `is_ascii_whitespace`, and
  `is_ascii_control` methods for `char` and `u8`.][73858]

Cargo
-----
- [`build-dependencies` are now built with opt-level 0 by default.][cargo/8500]
  You can override this by setting the following in your `Cargo.toml`.
  ```toml
  [profile.release.build-override]
  opt-level = 3
  ```
- [`cargo-help` will now display man pages for commands rather just the
  `--help` text.][cargo/8456]
- [`cargo-metadata` now emits a `test` field indicating if a target has
  tests enabled.][cargo/8478]
- [`workspace.default-members` now respects `workspace.exclude`.][cargo/8485]
- [`cargo-publish` will now use an alternative registry by default if it's the
  only registry specified in `package.publish`.][cargo/8571]

Misc
----
- [Added a help button beside Rustdoc's searchbar that explains rustdoc's
  type based search.][75366]
- [Added the Ayu theme to rustdoc.][71237]

Compatibility Notes
-------------------
- [Bumped the minimum supported Emscripten version to 1.39.20.][75716]
- [Fixed a regression parsing `{} && false` in tail expressions.][74650]
- [Added changes to how proc-macros are expanded in `macro_rules!` that should
  help to preserve more span information.][73084] These changes may cause
  compiliation errors if your macro was unhygenic or didn't correctly handle
  `Delimiter::None`.
- [Moved support for the CloudABI target to tier 3.][75568]
- [`linux-gnu` targets now require minimum kernel 2.6.32 and glibc 2.11.][74163]

Internal Only
--------
- [Improved default settings for bootstrapping in `x.py`.][73964]
  You can read details about this change in the ["Changes to `x.py`
  defaults"](https://blog.rust-lang.org/inside-rust/2020/08/30/changes-to-x-py-defaults.html)
  post on the Inside Rust blog.

- [Added the `rustc-docs` component.][75560] This allows you to install
  and read the documentation for the compiler internal APIs. (Currently only
  available for `x86_64-unknown-linux-gnu`.)

[1.47.0-cfg]: https://docs.microsoft.com/en-us/windows/win32/secbp/control-flow-guard
[76980]: rust-lang/rust#76980
[75048]: rust-lang/rust#75048
[74163]: rust-lang/rust#74163
[71237]: rust-lang/rust#71237
[74869]: rust-lang/rust#74869
[73858]: rust-lang/rust#73858
[75716]: rust-lang/rust#75716
[75908]: rust-lang/rust#75908
[75516]: rust-lang/rust#75516
[75560]: rust-lang/rust#75560
[75568]: rust-lang/rust#75568
[75366]: rust-lang/rust#75366
[75204]: rust-lang/rust#75204
[74650]: rust-lang/rust#74650
[74419]: rust-lang/rust#74419
[73964]: rust-lang/rust#73964
[74021]: rust-lang/rust#74021
[74060]: rust-lang/rust#74060
[73893]: rust-lang/rust#73893
[73526]: rust-lang/rust#73526
[73583]: rust-lang/rust#73583
[73084]: rust-lang/rust#73084
[73197]: rust-lang/rust#73197
[72488]: rust-lang/rust#72488
[cargo/8456]: rust-lang/cargo#8456
[cargo/8478]: rust-lang/cargo#8478
[cargo/8485]: rust-lang/cargo#8485
[cargo/8500]: rust-lang/cargo#8500
[cargo/8571]: rust-lang/cargo#8571
[`Ident::new_raw`]:  https://doc.rust-lang.org/nightly/proc_macro/struct.Ident.html#method.new_raw
[`Range::is_empty`]: https://doc.rust-lang.org/nightly/std/ops/struct.Range.html#method.is_empty
[`RangeInclusive::is_empty`]: https://doc.rust-lang.org/nightly/std/ops/struct.RangeInclusive.html#method.is_empty
[`Result::as_deref_mut`]: https://doc.rust-lang.org/nightly/std/result/enum.Result.html#method.as_deref_mut
[`Result::as_deref`]: https://doc.rust-lang.org/nightly/std/result/enum.Result.html#method.as_deref
[`TypeId::of`]: https://doc.rust-lang.org/nightly/std/any/struct.TypeId.html#method.of
[`Vec::leak`]: https://doc.rust-lang.org/nightly/std/vec/struct.Vec.html#method.leak
[`f32::TAU`]: https://doc.rust-lang.org/nightly/std/f32/consts/constant.TAU.html
[`f64::TAU`]: https://doc.rust-lang.org/nightly/std/f64/consts/constant.TAU.html
[`pointer::offset_from`]: https://doc.rust-lang.org/nightly/std/primitive.pointer.html#method.offset_from
@dtolnay dtolnay self-assigned this Mar 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. needs-fcp This change is insta-stable, so needs a completed FCP to proceed. relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.