Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafetyify From<Vec<char>> #35098

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 18 additions & 3 deletions src/libcollections/string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1895,9 +1895,24 @@ impl<'a> From<&'a [char]> for String {

#[stable(feature = "stringfromchars", since = "1.12.0")]
impl From<Vec<char>> for String {
#[inline]
fn from(v: Vec<char>) -> String {
String::from(v.as_slice())
fn from(mut v: Vec<char>) -> String {
unsafe {
let cap = v.capacity();
let ptr = v.as_mut_ptr() as *mut u8;
let mut bytes = 0usize;

for chr in v.iter() {
// Regular loop instead of copy_nonoverlapping, because LLVM insists on having a
// memcpy for slices shorter than 4 bytes.
for b in chr.encode_utf8().as_slice() {
// Neither bytes nor ptr can overflow.
*ptr.offset(bytes as isize) = *b;
bytes = bytes.wrapping_add(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapping_add?

Also would it be better to increase bytes outside this loop?

Copy link
Member Author

@nagisa nagisa Jul 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is used to offset the pointer as well, so there must be some sort of counter here anyway.

EDIT: wrapping_add because bytes cannot overflow, saving overflow check. In debug code, that is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I misread what bytes was used for.

Why does debug mode matter? IMO + is the operator to use if you know it can't overflow, the wrapping ops are for when you know it can overflow and that's what you want.

}
}
mem::forget(v);
String::from_raw_parts(ptr, bytes, cap * mem::size_of::<char>())
}
}
}

Expand Down
16 changes: 16 additions & 0 deletions src/libcollectionstest/string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -392,6 +392,22 @@ fn test_into_boxed_str() {
assert_eq!(&*ys, "hello my name is bob");
}

#[test]
fn test_string_from_vec_char() {
let str1 = String::from(vec!['a', 'b', '😃', 'a', 'b']);
let str2 = String::from(vec!['a', 'ą', 'あ', '🞎']);
let str3 = String::from(vec!['🞎', 'あ', 'ą', 'a']);
assert_eq!("ab😃ab", str1);
assert_eq!(str1.len(), 8);
assert!(str1.capacity() >= 20);
assert_eq!("aąあ🞎", str2);
assert_eq!(str2.len(), 10);
assert!(str2.capacity() >= 16);
assert_eq!("🞎あąa", str3);
assert_eq!(str3.len(), 10);
assert!(str3.capacity() >= 16);
}

#[bench]
fn bench_with_capacity(b: &mut Bencher) {
b.iter(|| String::with_capacity(100));
Expand Down
17 changes: 15 additions & 2 deletions src/libcore/char.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ const TAG_FOUR_B: u8 = 0b1111_0000;
const MAX_ONE_B: u32 = 0x80;
const MAX_TWO_B: u32 = 0x800;
const MAX_THREE_B: u32 = 0x10000;
const UTF8_BUF_SIZE: usize = 4;

/*
Lu Uppercase_Letter an uppercase letter
Expand Down Expand Up @@ -644,15 +645,27 @@ impl ExactSizeIterator for EscapeDebug { }
#[unstable(feature = "unicode", issue = "27784")]
#[derive(Debug)]
pub struct EncodeUtf8 {
buf: [u8; 4],
buf: [u8; UTF8_BUF_SIZE],
pos: usize,
}

impl EncodeUtf8 {
/// Returns the remaining bytes of this iterator as a slice.
#[unstable(feature = "unicode", issue = "27784")]
#[inline(always)]
pub fn as_slice(&self) -> &[u8] {
&self.buf[self.pos..]
// We know for sure this method cannot slice out-of-bounds because:
// * 0 ≤ self.pos ≤ 3
// * self.buf.len() = 4
//
// This way the slicing will always succeed, but LLVM is incapable of figuring out both
// these conditions hold, resulting in suboptimal code, especially after inlining.
// Ideally there would be a `slice_unchecked` method for slices, but there isn’t any,
// therefore we construct the slice manually.
unsafe {
::slice::from_raw_parts(self.buf.as_ptr().offset(self.pos as isize),
UTF8_BUF_SIZE.wrapping_sub(self.pos))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapping_sub?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoids the unnecessary overflow check.

}
}
}

Expand Down