Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose raw Stdout/err/in #58326

Open
ParadoxSpiral opened this issue Feb 9, 2019 · 16 comments
Open

Expose raw Stdout/err/in #58326

ParadoxSpiral opened this issue Feb 9, 2019 · 16 comments
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@ParadoxSpiral
Copy link

Currently there is not easy/obvious way to get an unbuffered Stdout/err/in. The types do exist in stdio, however they are not public for reasons not noted.

For example these types would be useful for CLI applications that write a lot of data at once without it getting unnecessarily flushed.

One can use platform specific extensions such as from_raw_fd on unix, and from _raw_handle on windows as a workaround.

@jonas-schievink jonas-schievink added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-feature-request Category: A feature request, i.e: not implemented / a PR. labels Feb 9, 2019
@josephlr
Copy link
Contributor

This is similar to #23818, where having more control over the buffering/flushing behavior of stdio/stdin would be helpfull.

@pitdicker
Copy link
Contributor

So I've been thinking about the same thing while working on #58454.

I do wonder how useful it they would be though.

For example these types would be useful for CLI applications that write a lot of data at once without it getting unnecessarily flushed.

I don't think Stdout flushes unnecessary? It should flush just as often or less than StdoutRaw, because Stdout lets large writes (greater than 8k) pass through directly. And Stderr isn't buffered at all.

Stdout does have an interesting comment:

pub struct Stdout {
    // FIXME: this should be LineWriter or BufWriter depending on the state of
    //        stdout (tty or not). Note that if this is not line buffered it
    //        should also flush-on-panic or some form of flush-on-abort.
    inner: Arc<ReentrantMutex<RefCell<LineWriter<Maybe<StdoutRaw>>>>>,
}

Working on this would change flushing behavior for piped buffered stdout, but is not at all relevant for StdoutRaw.

Stdout and Stderr do synchronize writes across threads however. I suppose that is why the panic code in the standard library writes to StderrRaw directly.

Another point, the Windows stdio implementation turns out to be quite tricky because it converts from UTF-8 to UTF-16 when writing to the console, but accepts arbitrary bytes when writing to a pipe. Ideally StdinRaw holds some state to deal with sliced UTF-16 buffers. And the buffered LineWriter in Stdout helps to give valid UTF-8 slices, because it slices at sentences, preventing incomplete UTF-8 code points.

Finally the non-raw types have a Maybe in-between, to gracefully handle the case where stdin/stdout/stderr is not available, see RFC 1014.

Some advantages the *Raw types could have:

  • StderrRaw does not do synchronization, I suppose there are cases where that is necessary to prevent deadlocks.
  • StdoutRaw will write everything out directly, no need for flushing. Bad for performance, and code that gradually builds up a line or message can give very messy output when multi-threaded. Still it can be used well and is desired by some (print! macro should flush stdout #23818).
  • Having the raw methods allows for building different abstractions.

A single synchronized Stdin with its own buffer seems pretty solid to me. I see no real advantage in StdinRaw.

@ParadoxSpiral
Copy link
Author

I don't think Stdout flushes unnecessary? It should flush just as often or less than StdoutRaw, because Stdout lets large writes (greater than 8k) pass through directly. And Stderr isn't buffered at all.

Oh, I wasn't aware that LineWriter hands big writes through. However it would still be useful if you could wrap StdoutRaw in a BufWriter that you have more control over without also still having the LineWriter beneath.

Some advantages the *Raw types could have:

A slight advantage is that you can more easily check if any of them are not present, since the raw functions return Results.

A single synchronized Stdin with its own buffer seems pretty solid to me. I see no real advantage in StdinRaw.

I agree a StdinRaw is probably not desireable.

@pitdicker
Copy link
Contributor

A slight advantage is that you can more easily check if any of them are not present, since the raw functions return Results.

Sorry, just made a PR to that makes them no longer return a Result #58768. But it didn't work, the raw types on all systems already only return Ok.


I am still trying to make up my mind whether writing up a pre-RFC or PR to expose the *Raw types brings something useful.

StdoutRaw can in some sense break the synchronization promise of Stdout: when Stdout is locked by one thread it can write multiple lines without having lines from another thread 'interrupt'. So any custom implementation that wraps StdoutRaw that want to play nice with the standard library should lock Stdout before using StdoutRaw. That seems to make the idea of using different synchronization primitives not really interesting anymore.

Oh, I wasn't aware that LineWriter hands big writes through. However it would still be useful if you could wrap StdoutRaw in a BufWriter that you have more control over without also still having the LineWriter beneath.

I am preparing a PR that switches the buffering mode between LineWriter and Bufwriter depending on whether a terminal is connected, but expect it to not land easily... Would that fit your needs, or some method on Stdout that gives more control over the buffering?

@retep998
Copy link
Member

One big question is should we have StdoutRaw and StdinRaw for Windows consoles that allow the user to read and write [u16] directly? Should we also allow the user to write arbitrary [u8] bytes through the narrow codepage?

@ParadoxSpiral
Copy link
Author

I am preparing a PR that switches the buffering mode between LineWriter and Bufwriter depending on whether a terminal is connected, but expect it to not land easily... Would that fit your needs, or some method on Stdout that gives more control over the buffering?

I would like to be able to set the capacity of the inner BufWriter.

@BartMassey
Copy link
Contributor

BartMassey commented Oct 29, 2019

I wrote up some relevant stuff on Reddit here. My repo is here if you want to play with it. It all amounts to a pretty good argument that exposing StdoutRaw isn't going to buy you much performance in typical cases, I think.

@BartMassey
Copy link
Contributor

BartMassey commented Nov 10, 2019

After further investigation it looks like writing directly to the underlying File with custom buffering is much faster than anything that I've been able to figure out for using stdout(). I'm playing with this right now; see http://github.com/BartMassey/rust-nonstdio for a very early preview of a thing. The Background section of the README has some information that is relevant here.

@jgoerzen
Copy link

jgoerzen commented Jun 8, 2022

This led to an otherwise-unnecessary use of std::io::copy for me. I have data that I want to pipe to a command. This data may come from stdin, or it may come from some other Read. I can't just call .stdin() on this with a handle that comes from io::stdin() if I've read anything from that handle, because the first read, even if read_exact on 1 byte, will have read 8K and the remainder of the 8K block will be discarded.

This is an unfortunate data loss bug that is entirely non-obvious in the library and not blocked by the type system.

@agausmann
Copy link
Contributor

agausmann commented Jul 27, 2022

In my application I need to forward I/O from a serial port to stdio, and the serial port is an interactive console where the remote device echoes the characters back if they should be printed on the terminal, and controls various other aspects of the terminal. (Come to think of it, It doesn't have to be a serial port, another similar example with this behavior is an SSH client)

Line-buffered stdin simply does not work for this case; action needs to be taken for every input byte, not for every line.

@tbu-
Copy link
Contributor

tbu- commented Oct 5, 2022

Having line-buffered stdout is also an unnecessary performance overhead when outputting binary data. First the code scans for newlines to only partially write the bytes to stdout and copy the remaining bytes into a buffer, only for me to flush the remaining bytes out.

One syscall too many for every write that I do and unnecessary buffer copying.

@SUPERCILEX
Copy link
Contributor

Created a proposal for this: rust-lang/libs-team#148

@WieeRd
Copy link

WieeRd commented Nov 8, 2023

If anyone else came across this issue while looking for a workaround, this is what I'm using right now.

use std::{fs::File, io};

#[cfg(unix)]
pub fn stdout_raw() -> File {
    use std::os::fd::{AsRawFd, FromRawFd};

    let stdout = io::stdout();
    let raw_fd = stdout.as_raw_fd(); // or just use `1`
    unsafe { File::from_raw_fd(raw_fd) }
}

#[cfg(windows)]
pub fn stdout_raw() -> File {
    use std::os::windows::io::{AsRawHandle, FromRawHandle};

    let stdout = io::stdout();
    let raw_handle = stdout.as_raw_handle();
    unsafe { File::from_raw_handle(raw_handle) }
}

#[cfg(test)]
mod test {
    use super::*;
    use std::io::{self, Write};

    #[test]
    fn rawwwwww() -> io::Result<()> {
        let mut stdout = stdout_raw();
        stdout.write_all(b"This stdout... is RAWWWWWW!!!")?;

        Ok(())
    }
}

@jgoerzen
Copy link

jgoerzen commented Nov 8, 2023

In Filespooler, for Unix, I am using:

/// stdin is buffered by default on Rust, and you can't change it.  Since
/// we need to precisely read the header before letting a subprocess
/// handle the payload in stdin-process, we have to use trickery.  Bleh.
///
/// Take care not to let this value drop before spawning, because that would
/// cause stdin to be closed.
///
/// See: https://github.com/rust-lang/rust/issues/97855
pub fn get_unbuffered_stdin() -> File {
    let s = stdin();
    let locked = s.lock();
    let file = unsafe { File::from_raw_fd(locked.as_raw_fd()) };
    file
}

FWIW

@tbu-
Copy link
Contributor

tbu- commented Nov 8, 2023

AFAICT, both of these will close the actual stdin/stdout FD once the returned file gets dropped. This is probably not intended.

Additionally, the solution posted by @jgoerzen looks like it's trying to lock stdin, but the lock is immediately dropped after the function is returned.

@WieeRd
Copy link

WieeRd commented Nov 10, 2023

Definitely wouldn't recommend using my snippet for any serious/larger code, it's just suggested as a quick and dirty workaround for trivial but IO intensive scenario. In my case I was solving an algorithm problem when I encountered this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests