Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String methods for replacing a sub-string with another one in place #2462

Open
gnzlbg opened this issue Jun 7, 2018 · 11 comments
Open

String methods for replacing a sub-string with another one in place #2462

gnzlbg opened this issue Jun 7, 2018 · 11 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 7, 2018

I've often needed to replace a sub-string within a String in place with a new sub-string. If the new sub-string is smaller than the sub-string its replacing, that can be done in place without any allocations.

I don't know if this is the right place to fill in a "feature request", but there is only one correct efficient way to implement such a method, and an RFC feels overkill for that.

The only thing I don't know is how to name it. We could name it String::replace_in_place and String::replacen_in_place but... if the new string is larger than the one it is replacing then an allocation will happen, so "in place" isn't 100% accurate, unless we consider "in place" to mean that it happens on this String, independently of whether it is reallocated or not.

@golddranks
Copy link

golddranks commented Jun 7, 2018

I have had this need often when having to replace some fixed with date/time format or a fixed width (zero-padded) serial code in some string. It always feels really bad to allocate for that, because Rust makes it explicit and you know it would be possible since you know the exact length of your string.

@scottmcm
Copy link
Member

scottmcm commented Jun 8, 2018

@gnzlbg So sortof a combination find+splice?

@SimonSapin
Copy link
Contributor

I feel that this is niche enough that it can be developed on crates.io, at least at first.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 8, 2018

@scottmcm kind of but doing this without allocating if the replacement is smaller, and allocating only once if it is larger, and in one pass, is tricky.

@frewsxcv
Copy link
Member

frewsxcv commented Jun 8, 2018

I have a sorta similar issue for non-allocating string pattern removal: rust-lang/rust#50206

@warlord500
Copy link

warlord500 commented Jun 8, 2018

here's the thing, you can convert String to Vec via into then operate on bytes directly and convert it back to
String with from_utf8 call with no allocations already using no unsafe code.
however the from_utf8 error needs to be handled in which case you can possibly panic if your byte manipulations are wrong!

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 9, 2018

I wasn't suggesting that it was impossible to achieve this by building something on top of String. My point was only that this is a conceptual common operation, that is hard to do yourself both correctly and efficiently if you are going to start manipulating bytes.

Because it is not "easily" available it's often better to say "let's do an allocation and call it a day" than trying to implement this yourself unless this is your bottle neck. But often I have to replace patterns of equal length (one letter with another) or larger patterns with shorter patters, and if this would be available, I'll definitely prefer it over doing an allocation.

@SimonSapin
Copy link
Contributor

SimonSapin commented Jun 11, 2018

I agree that this problem, like many problems, is better by having a shared solution that is well thought-out and that people can reuse, so that every potential user doesn’t need to solve it again every time. My point is only that crates.io is a fine way to do such reuse, not everything needs to be in the standard library.

My subjective feeling that this use case has enough combined constraints that it’s not that common, but maybe I’m wrong. Once an implementation exists and gets some use, we can see if it becomes popular and consider inclusion then.

@burdges
Copy link

burdges commented Jun 12, 2018

I forget: Why is String : Borrow<str>+Deref<Target=str>+DerefMut but not BorrowMut<str>?

In any case, this sounds largely like &mut str problem, not just String, so a str method avoids callers needing .as_mut_str(), thanks to DerefMut. We could use an external crate more cleanly if we either impl BorrowMut<str> for &mut String, which maybe does not exist for some reason, or else use DerefMut bounds like BorrowMut bounds.

fn replace_at<T,S>(target: T, source: S, offset: usize)
  where
    T: Deref<Target=str>+DerefMut,
    S: Borrow<str>

Anyways, there are several issues that need fixing here:

We've no StrErr type and panic on UTF-8 violations, which we need to make adding methods to &mut str worth it. We could panic at either split_at_mut in this function:

fn replace_at<S: Borrow<str>>(self: &mut str, source: S, offset: usize) {
    let s = source.borrow();
    self.split_at_mut(offset).1
          .split_at_mut(s.len()).0
          .as_bytes_mut().copy_from_slice(s.as_bytes_mut());
}

We've no fn shift(self: &mut &mut str, num: usize) -> Result<(),StrErr> method to shift a &mut str forward. In general, we've no tool to iterate over arrays, much less str, looking at several distinct positions mutably, so like an immutable iterator running ahead and possibly faster than a mutable iterator, not sure if iter-tools has this. We might want this generalization of shift to do roughly

pub fn replace_mut<'a, P>(&'a mut self, from: P, to: &str) -> Result<(),StrErr> where P: Pattern<'a>, 

We need a method to smartly borrow a String as a &mut &mut str, and copy back out the modified length, so the above operations can make the String shorter.

I think allocating only once if the result is larger sounds kinda tricky, maybe you even want xi-rope

@SimonSapin
Copy link
Contributor

I don’t see a reason not to add BorrowMut.

However you cannot change the length of &mut str, so you probably still want String for an in-place replace with a smaller replacement.

@burdges
Copy link

burdges commented Jun 12, 2018

Oops, I'd imagined changing the length of the inner &mut str inside a &mut &mut str, but this permits retargetting the inner &mut str completely, which sounds bad. In fact, you could maybe forbid retargetting outside itself with lifetime tricks, but you'd then need Drop to shift the data back into place for String. So..

Appropriate generality for array and string operations that avoid reallocations might resemble:

pub trait TrimBorrowedSlice : Deref<Target=T> + DerefMut + ?Sized {
    fn trim_left(&mut self);
    fn trim_right(&mut self);
}
pub struct BorrowedSliceGuard<'a,T,A: 'a>
  where A: Deref<Target=T> + DerefMut + ?Sized {
    __original: *mut A,
    __borrow: &mut T,
}

impl<'a> TrimBorrowSlice for BorrowedSliceGuard<'a,str,&'a mut str>(&'a mut &'a mut str) { ... }
impl<'a> TrimBorrowSlice for BorrowedSliceGuard<'a,str,String>(&'a mut str) { ... }

impl<'a> Drop for BorrowedSliceGuard<'a,str,&'a mut str>(&'a mut &'a mut str) { ... }
impl<'a> Drop for BorrowedSliceGuard<'a,str,String>(&'a mut str) { ... }

// Delegate dereferencing
impl<'a,T,A: 'a> Deref<Target=T> for BorrowedSliceGuard<'a,T,A: ?Sized + 'a> {
    type Target=T;
    fn deref(&self) -> &T { self.0.deref() }
}
impl<'a,T,A: 'a> DerefMut for BorrowedSliceGuard<'a,T,A: ?Sized + 'a> {
    fn deref_mut(&self) -> &T { self.0.deref_mut() }
}

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Oct 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

8 participants