Unsafetyify From<Vec<char>> #35098

nagisa · 2016-07-28T20:38:53Z

Follow-up PR for #35054

rust-highfive · 2016-07-28T20:39:06Z

(rust_highfive has picked a reviewer for you, use r? to override)

petrochenkov · 2016-07-28T21:28:57Z

src/libcollections/string.rs

+impl<'a> From<&'a [char]> for String {
+    #[inline]
+    fn from(v: &'a [char]) -> String {
+        let mut s = String::with_capacity(v.len());


What do you think about the suggestion from #35054 (comment)?
char usually implies that we work with human language so we can use domain knowledge.
len + len / 8(=1.125) or len + len / 16(=1.0625) cover most of European languages.
For reference,
ratio(en) ≈ 1.002
ratio(fr) ≈ 1.040
ratio(de) ≈ 1.016
ratio(hu) ≈ 1.091 (lots of diacritics)

I’m really uninterested in tackling this implementation in this PR.

But since you’ve asked, I ought to state my opinion on the topic, at least. I agree that nobody would store ASCII-only text as UTF-32 (there’s bytestrings, after all) and any ratio > 1 is therefore better than ratio = 1. ratio = 1.5 to 1.6 coupled with the fact that reallocation doubles capacity could be a good choice, especially given the fact that ratio(jp), ratio(cn) and ratio(ko) are all somewhere between 2 and a bit over 3.

That being said, my gut tells me that nobody would be using this conversion with any serious expectations towards its performance, thus thinking about this problem is not very productive.

We know for sure this method cannot slice out-of-bounds because: * 0 ≤ self.pos ≤ 3 * self.buf.len() = 4 This way the slicing will always succeed, but LLVM is incapable of figuring out both these conditions hold, resulting in suboptimal code, especially after inlining.

Machine code turned out pretty nicely.

sfackler · 2016-08-05T04:02:17Z

We discussed this at the libs triage, and @alexcrichton raised some soundness concerns. The allocator is provided with alignment information when deallocating the memory backing the String, which will be wrong in this case.

nagisa · 2016-08-05T10:59:02Z

Fair point. I feel like something like cap = __rust_reallocate_inplace(ptr, cap, cap, new_align) should be enough here, but I’m having serious trouble finding out any documentation on either jemalloc or rust allocation functions.

@alexcrichton do you think calling that function is a correct way to “realign” memory?

alexcrichton · 2016-08-05T15:47:46Z

As far as I know I don't think we have a way to realign memory, unfortunately :(

alexcrichton · 2016-08-08T16:04:05Z

Closing for now due to the unsafety concerns (and lack of knowledge of a solution to them), but feel free to reopen if a solution is thought of!

rust-highfive assigned alexcrichton Jul 28, 2016

nagisa mentioned this pull request Jul 28, 2016

EncodeUtf8 iterator implementation results in bad code #35099

Closed

nagisa force-pushed the unsafetyify-from-vec-char branch 2 times, most recently from 8c2706d to 8a5d5a2 Compare July 28, 2016 21:26

petrochenkov reviewed Jul 28, 2016
View reviewed changes

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jul 29, 2016

nagisa added 3 commits August 2, 2016 14:06

Make From<Vec<char>> for String in-place

20c84cd

Machine code turned out pretty nicely.

Add a test for the unsafe conversion

0ce9323

nagisa force-pushed the unsafetyify-from-vec-char branch from 8a5d5a2 to 0ce9323 Compare August 2, 2016 11:06

alexcrichton closed this Aug 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsafetyify From<Vec<char>> #35098

Unsafetyify From<Vec<char>> #35098

nagisa commented Jul 28, 2016

rust-highfive commented Jul 28, 2016

petrochenkov Jul 28, 2016

nagisa Jul 28, 2016

sfackler commented Aug 5, 2016

nagisa commented Aug 5, 2016

alexcrichton commented Aug 5, 2016

alexcrichton commented Aug 8, 2016

Unsafetyify From<Vec<char>> #35098

Unsafetyify From<Vec<char>> #35098

Conversation

nagisa commented Jul 28, 2016

rust-highfive commented Jul 28, 2016

petrochenkov Jul 28, 2016

Choose a reason for hiding this comment

nagisa Jul 28, 2016

Choose a reason for hiding this comment

sfackler commented Aug 5, 2016

nagisa commented Aug 5, 2016

alexcrichton commented Aug 5, 2016

alexcrichton commented Aug 8, 2016