Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement From<Vec<char>> and From<&'a [char]> for String #35054

Merged
merged 1 commit into from
Aug 2, 2016

Conversation

pwoolcoc
Copy link
Contributor

Though there are ways to convert a slice or vec of chars into a string,
it would be nice to be able to just do String::from(&['a', 'b', 'c']),
so this PR implements From<Vec<char>> and From<&'a [char]> for
String.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @brson (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@pwoolcoc
Copy link
Contributor Author

I know that String::with_capacity(v.len() * size_of::<char>()) will probably over-allocate in most cases, but I thought that it would be better to do that, then to just use ::new() and have to reallocate a lot in the loop. Let me know if there is something smarter I could do here.

@tbu-
Copy link
Contributor

tbu- commented Jul 26, 2016

String::with_capacity(v.len()) is probably more reasonable, it will allocate the minimum amount and the allocation strategy of String should deal with the non-minimal case quite well (it doesn't allocate for each character added).

@brson
Copy link
Contributor

brson commented Jul 26, 2016

Agree that v.len() is probably a better initial capacity. @tbu- can you update it? Edit: Sorry, I meant @pwoolcoc.

This patch makes sense to me. f? @rust-lang/libs

@brson brson added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. I-nominated labels Jul 26, 2016
s.push(c);
}
s
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense to delegate to the &[char] impl here instead of duplicating the code.

It's possible we could do something clever to reuse the Vecs buffer, though I don't think it's worth thinking about in this PR.

@petrochenkov
Copy link
Contributor

petrochenkov commented Jul 26, 2016

Agree that v.len() is probably a better initial capacity.

That's an unstable optimization for purely ASCII text. A single non-ascii character in the string of any size (think of © or «) and the buffer is guaranteed to be reallocated. It may be reasonable to use some small correction (1.1 or so, needs estimation) to make the buffer resistant to small non-ascii noise.
(The coefficient is still text-dependent, for example 2.0 is perfect for Cyrillic texts, I actually used this as an optimization couple of times.)

@tbu-
Copy link
Contributor

tbu- commented Jul 26, 2016

@petrochenkov Yes, a single non-ASCII character will change that. However, this is what we do everywhere, allocations are always based on the minimum capacity that will be necessary (it's always the Iterator::size_hint().0 that is used for initial capacity).

@pwoolcoc
Copy link
Contributor Author

Thanks for the suggestions everyone. I'll be back online in a couple hours & I'll push some changes.

-------- Original message --------
From: tbu- notifications@github.com
Date: 7/26/16 6:49 PM (GMT-05:00)
To: rust-lang/rust rust@noreply.github.com
Cc: Paul Woolcock paul@woolcock.us, Mention mention@noreply.github.com
Subject: Re: [rust-lang/rust] implement From<Vec<char>> and From<&'a [char]> for String (#35054)

@petrochenkov Yes, a single non-ASCII character will change that. However, this is what we do everywhere, allocations are always based on the minimum capacity that will be necessary (it's always the Iterator::size_hint().0 that is used for initial capacity).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@alexcrichton
Copy link
Member

Sounds reasonable to me!

@pwoolcoc
Copy link
Contributor Author

Is it ok to leave the impl as impl<'a> From<&'a [char]> or should it be something more flexible like impl<T: AsRef<[char]>> From<T>?

@pwoolcoc
Copy link
Contributor Author

r? @brson

@nagisa
Copy link
Member

nagisa commented Jul 27, 2016

I feel like the From<Vec<char>> implementation is very naive. At the very least it could be made to do no allocations at all:

impl From<Vec<char>> for String {
    fn from(mut v: Vec<char>) -> String {
        unsafe {
            let ptr = v.as_mut_ptr() as *mut u8;
            let mut bytes = 0;
            {
            let mut rest = v.as_mut_slice();
            while let Some((chr, rest_)) = {rest}.split_first_mut() {
                for byte in chr.encode_utf8() {
                    *ptr.offset(bytes) = byte;
                    bytes += 1;
                }
                rest = rest_;
            }
            }
            let cap = v.capacity();
            ::std::mem::forget(v);
            String::from_raw_parts(ptr, bytes as usize, cap)
        }
    }
} 
// Perhaps this code could be made better, I didn’t ponder much on it.

@alexcrichton
Copy link
Member

@nagisa as @brson mentioned earlier we could indeed do things like reuse the buffer, but for now it doesn't seem worth the unsafe complexity when no one's clamboring for it.

@pwoolcoc yeah I think it's best to stay concrete and avoid generics for From impls where conflicts are sometimes difficult to avoid.

@pwoolcoc
Copy link
Contributor Author

thanks @alexcrichton, I think it is ready to go but I am unable to replicate the test failure that travis is reporting

@alexcrichton
Copy link
Member

Ah yeah that's ok, if you rebase on master it should fix it as the PR to solve that problem went in a few hours ago

Though there are ways to convert a slice or vec of chars into a string,
it would be nice to be able to just do `String::from(['a', 'b', 'c'])`,
so this PR implements `From<Vec<char>>` and `From<&'a [char]>` for
String.
@nagisa
Copy link
Member

nagisa commented Jul 27, 2016

but for now it doesn't seem worth the unsafe complexity when no one's clamboring for it.

Seems to go at odds with the philosophy of From conversions being cheap to me.

I’d like to point out that I implemented the code I pasted above manually two or three times already in various locations and so far the desire to reuse the allocation was pretty strong in each use-case. The implementation as proposed by the PR is plain useless as far as I’m and my code are concerned, which is exactly why I am complaining.

Shall I send a PR against this PR?

@brson
Copy link
Contributor

brson commented Jul 27, 2016

@pwoolcoc The next steps are to wait for the libs team to approve the new APIs. Typically this takes until next tuesday, though if enough of them chime in here it could go faster.

@nagisa I recognize that optimization is desirable, but still prefer to do it as a follow up, for a few reasons: unsafe optimizations require a different set of eyes and more thorough review; I want to lower barriers to contribution, not frustrate contributors by expanding the scope of PRs, make landing small contributions faster; generally, I'd like to hold up fewer issues by waiting for perfection, and be more willing to settle for incremental progress.

(On the subject of making small / first-time contributions faster - it is quite frustrating that any minor lib feature enhancements have a week turnaround waiting for the libs team to meet face-to-face. Very bad contributor experience.)

@pwoolcoc
Copy link
Contributor Author

@brson ok, thanks!

@brson
Copy link
Contributor

brson commented Aug 1, 2016

@bors r+ libs team is happy

@bors
Copy link
Contributor

bors commented Aug 1, 2016

📌 Commit ac73335 has been approved by brson

@bors