Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy between docs and implementation of BufReader::fill_buf #48022

Closed
jonhoo opened this issue Feb 5, 2018 · 12 comments · Fixed by #54046
Closed

Discrepancy between docs and implementation of BufReader::fill_buf #48022

jonhoo opened this issue Feb 5, 2018 · 12 comments · Fixed by #54046
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. P-medium Medium priority

Comments

@jonhoo
Copy link
Contributor

jonhoo commented Feb 5, 2018

Would you like to fix this bug? If so, here's how.


The documentation currently states that BufRead::fill_buf:

Fills the internal buffer of this object, returning the buffer contents.

However, this is not what BufReader's implementation of that method does. It will only fill the buffer if no data is currently buffered. If data is already present in the buffer, no filling takes place:

impl<R: Read> BufRead for BufReader<R> {
    fn fill_buf(&mut self) -> io::Result<&[u8]> {
        if self.pos >= self.cap {
            self.cap = self.inner.read(&mut self.buf)?;
            self.pos = 0;
        }
        Ok(&self.buf[self.pos..self.cap])
    }
}

I would argue that the current behavior of BufReader's implementation is usually what users would want to do, but if does not match the documentation of, nor the name of, fill_buf. I don't know what the right solution here is, but I suspect the only safe thing to do would be to update BufReader's implementation to match the docs (i.e., to make it actually fill its buffer), and then to add another method to BufRead (perhaps buf_bytes?) that gives the currently buffered contents, regardless of the amount of buffered data, and without reading from the underlying Read.

@TimNN TimNN added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools labels Feb 6, 2018
@abonander
Copy link
Contributor

abonander commented Feb 7, 2018

Adding a method to BufRead would be a breaking change unless you have a suitable default implementation in mind.

I would love it if we can make "every time this is called, the implementation should fetch more data" part of the fill_buf() API contract because it's easy to get burned when you make that assumption with BufReader. The only one impl I know of where this guarantee isn't really feasible is io::Chain which can only return buffers from one source at a time.

@jonhoo
Copy link
Contributor Author

jonhoo commented Feb 7, 2018

Well, even that is tricky, right? What if the buffer is already full? I also think there's value in being able to get access to the bytes currently in the buffer without doing a syscall (something like get_buf).

@abonander
Copy link
Contributor

abonander commented Feb 7, 2018

The issue is that you can't really do anything to the structure of the BufRead trait at this point without breaking changes, though this is now feasible with epochs. Adding extra functionality directly to BufReader is quite a bit more reasonable though. I have an old RFC that I couldn't be bothered to champion and instead just reimplemented it in my own crate.

@jonhoo
Copy link
Contributor Author

jonhoo commented Feb 8, 2018

I guess one question is where we decide to not expand io::BufReader and say that external crates should provide features instead. I'd advocate for this being something io::BufReader should support (though as you say, possibly not io::BufRead).

@twimpiex
Copy link

I also ran into this 'faulty' documentation issue.
I have a program which needed pushing back data on the reader and by reading the documentation I thought I could use fill_buf with consume for that, but found out the hard way that fill_buf isn't actually filling the buffer but returning the existing buffer if something is still in there.
Hardly appreciated if the fill_buf acts as documented.

@frewsxcv frewsxcv added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 3, 2018
@sfackler
Copy link
Member

sfackler commented Mar 3, 2018

The only one impl I know of where this guarantee isn't really feasible is io::Chain which can only return buffers from one source at a time.

The proposed behavior is fundamentally broken. Imagine a BufReader<TcpStream> being used to parse some line-oriented protocol. The BufReader's buffer capacity is 4096 bytes, and the peer sends a single TCP packet with 2 lines in 100 bytes. It then expects our end to respond, so it's not doing any more writes.

You call read_line, which calls into fill_buf. Since this is a new BufReader, the buffer is empty so it calls down into the TcpStream and reads all 100 bytes into the buffer. read_line then returns the first line, and there's the one remaining line in the buffer.

Now you call read_line again to read the last line the peer sent over. With the current behavior, the BufReader's buffer is nonempty, so fill_buf immediately returns it. read_line returns the last line our end responds to the peer, and everything is happy. With the proposed behavior, fill_buf calls back down into the TcpStream since its buffer isn't full and the protocol deadlocks.

@frewsxcv
Copy link
Member

frewsxcv commented Mar 3, 2018

Just to clarify, is the idea here to update the BufRead docs from:

Fills the internal buffer of this object, returning the buffer contents.

...into something like this?

Fills the internal buffer of this object, returning the buffer contents. If the buffer already has data prior to fill_buf being called, the buffer will not grow and the buffer contents will be returned.

@frewsxcv
Copy link
Member

frewsxcv commented Mar 3, 2018

Ah, @sfackler commented while I was writing my comment and answered a couple of the questions I had 👍

@sfackler
Copy link
Member

sfackler commented Mar 3, 2018

I'd probably phrase it as

returns the contents of the internal buffer, filling it with more data from the inner reader if it is empty

@jonhoo
Copy link
Contributor Author

jonhoo commented Mar 4, 2018

@sfackler I agree with you that changing the behavior of fill_buf seems wrong. As pointed out in the original post: "the current behavior of BufReader's implementation is usually what users would want to do, but if does not match the documentation of, nor the name of, fill_buf".

I do think there's an argument for providing a method that gives the buffer contents without ever doing a syscall, but that's probably an argument for another issue.

@steveklabnik
Copy link
Member

I'm happy to mentor anyone who wants to fix this issue! Here's how:

The line in question is here:

/// Fills the internal buffer of this object, returning the buffer contents.

This should be changed to:

Returns the contents of the internal buffer, filling it with more data from the inner reader if it is empty.

I'm happy to answer any additional questions!

@steveklabnik steveklabnik added E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. P-medium Medium priority E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. and removed T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels May 28, 2018
@jonhoo
Copy link
Contributor Author

jonhoo commented May 28, 2018

Perhaps we should also include a reference to the newly added BufReader::buffer (#49139)? Though I suppose that is (sadly) only available for BufReader and not all BufRead.

kennytm added a commit to kennytm/rust that referenced this issue Sep 12, 2018
Update documentation for fill_buf in std::io::BufRead

Brings the documentation in line with the BufReader implementation.

Fixes rust-lang#48022.

This is my first PR, and I think the `E-easy` label is very cool, as so is the practice of describing the fix but leaving it for someone else; it really makes it a lot less intimidating to get started with something!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. P-medium Medium priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants