Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafetyify From<Vec<char>> #35098

Closed
wants to merge 3 commits into from

Conversation

nagisa
Copy link
Member

@nagisa nagisa commented Jul 28, 2016

Follow-up PR for #35054

@rust-highfive
Copy link
Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@nagisa nagisa force-pushed the unsafetyify-from-vec-char branch 2 times, most recently from 8c2706d to 8a5d5a2 Compare July 28, 2016 21:26
impl<'a> From<&'a [char]> for String {
#[inline]
fn from(v: &'a [char]) -> String {
let mut s = String::with_capacity(v.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about the suggestion from #35054 (comment)?
char usually implies that we work with human language so we can use domain knowledge.
len + len / 8(=1.125) or len + len / 16(=1.0625) cover most of European languages.
For reference,
ratio(en) ≈ 1.002
ratio(fr) ≈ 1.040
ratio(de) ≈ 1.016
ratio(hu) ≈ 1.091 (lots of diacritics)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m really uninterested in tackling this implementation in this PR.

But since you’ve asked, I ought to state my opinion on the topic, at least. I agree that nobody would store ASCII-only text as UTF-32 (there’s bytestrings, after all) and any ratio > 1 is therefore better than ratio = 1. ratio = 1.5 to 1.6 coupled with the fact that reallocation doubles capacity could be a good choice, especially given the fact that ratio(jp), ratio(cn) and ratio(ko) are all somewhere between 2 and a bit over 3.

That being said, my gut tells me that nobody would be using this conversion with any serious expectations towards its performance, thus thinking about this problem is not very productive.

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jul 29, 2016
We know for sure this method cannot slice out-of-bounds because:

* 0 ≤ self.pos ≤ 3
* self.buf.len() = 4

This way the slicing will always succeed, but LLVM is incapable of figuring out both these
conditions hold, resulting in suboptimal code, especially after inlining.
Machine code turned out pretty nicely.
@sfackler
Copy link
Member

sfackler commented Aug 5, 2016

We discussed this at the libs triage, and @alexcrichton raised some soundness concerns. The allocator is provided with alignment information when deallocating the memory backing the String, which will be wrong in this case.

@nagisa
Copy link
Member Author

nagisa commented Aug 5, 2016

Fair point. I feel like something like cap = __rust_reallocate_inplace(ptr, cap, cap, new_align) should be enough here, but I’m having serious trouble finding out any documentation on either jemalloc or rust allocation functions.

@alexcrichton do you think calling that function is a correct way to “realign” memory?

@alexcrichton
Copy link
Member

As far as I know I don't think we have a way to realign memory, unfortunately :(

@alexcrichton
Copy link
Member

Closing for now due to the unsafety concerns (and lack of knowledge of a solution to them), but feel free to reopen if a solution is thought of!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants