Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: clarify the behavior of readFile with fd #10853

Closed
wants to merge 1 commit into from
Closed

doc: clarify the behavior of readFile with fd #10853

wants to merge 1 commit into from

Conversation

seishun
Copy link
Contributor

@seishun seishun commented Jan 17, 2017

Makes it explicit that readFile doesn't seek to the beginning.

This is related to #9671, but that issue is actually a combination of two issues.

  1. On Windows, fs.read/write always updates the current file position, but on other OSes it doesn't when an explicit offset is specified.
  2. fs.readFile reads from the current offset when called with an fd instead of from the beginning.

Fixing at least one of the above would fix @jorangreef's issue. I tried fixing the second one in #9699, but that broke Linux and Mac OS tests (see #10809). This PR clarifies that it's not a bug but a feature.

Not sure which of the above mentioned issues should be mentioned in the commit message, if any.

Checklist
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

doc

Makes it explicit that readFile doesn't seek to the beginning.
@nodejs-github-bot nodejs-github-bot added doc Issues and PRs related to the documentations. fs Issues and PRs related to the fs subsystem / file system. dont-land-on-v7.x and removed dont-land-on-v7.x labels Jan 17, 2017
@addaleax
Copy link
Member

Not sure which of the above mentioned issues should be mentioned in the commit message, if any.

We probably need to get a feeling for user expectations on this, and if the current behaviour is unexpected, weigh that against the breakage of changing it?

@seishun
Copy link
Contributor Author

seishun commented Jan 17, 2017

Are you sure you quoted the right sentence? I don't see the connection.

We probably need to get a feeling for user expectations on this, and if the current behaviour is unexpected, weigh that against the breakage of changing it?

Well we have the following two facts:

  1. No one has ever complained about this behavior as far as I know.
  2. Changing this would make the behavior inconsistent between regular fd's and non-seekable fd's.

@addaleax
Copy link
Member

Oh, sorry. What I basically meant to say was “I’m not sure whether this fixes #9699 or not because we haven’t made any decision on behavioural changes yet.”

@sam-github
Copy link
Contributor

No one has ever complained about this behavior as far as I know.

A bug was reported against fs because when reading from a fd, it did not read the whole file.

I think its weird to not read the whole file. I also think the seek could be wrapped with try/catch and ignored on ESPIPE - basically, special cased to do what it does now for pipes and sockets because they obviously can't be read "from the beginning".

@seishun
Copy link
Contributor Author

seishun commented Jan 17, 2017

A bug was reported against fs because when reading from a fd, it did not read the whole file.

Can you link to it?

@sam-github
Copy link
Contributor

@seishun #9699

@seishun
Copy link
Contributor Author

seishun commented Jan 17, 2017

@sam-github Um, yeah, that's my PR. Have you read my original comment?

@sam-github
Copy link
Contributor

@jorangreef reported a bug in #9671 and you reference it.

You also claim that "no one has complained about this behaviour as far as I know" in #10853 (comment).

And in this PR you just document the weird behaviour based on "no one has complained".

@seishun Seems contradictory to me, perhaps you can clear up my confusion?

@seishun
Copy link
Contributor Author

seishun commented Jan 17, 2017

@sam-github @jorangreef's issue isn't specifically about fs.readFile. His issue can be fixed either by changing the behavior of fs.readFile, or by fixing fs.read. And it seems fixing the latter is going to be less breaking.

@sam-github
Copy link
Contributor

And it seems fixing the latter is going to be less breaking.

Can you link to the PR or issue where this is happening?

@seishun
Copy link
Contributor Author

seishun commented Jan 17, 2017

I was planning to look into it after this PR is merged.

@sam-github
Copy link
Contributor

So here are some of my thoughts: rants?

The history of this is a tangled mess, reaching from here, back to #10809, to #3163, to nodejs/node-v0.x-archive#8471 and nodejs/node-v0.x-archive#8522, and seems to have had some fd leaks and other quality problems on the way.

Adding fd support was thought of as a "sure, why not?", but we didn't adequately consider what that means when the fd is not currently seeked to start-of-file, or what to do on an unseekable fd, or the maintenance cost, which is clearly >>> 0.

At this point, we have tests running on stdin, which fail, but the tests are probably invalid, looks like they were written agains stdin without documenting what the behaviour is with stdin, or what it even means to "read file" from a fd, when the fd is a pipe.

And now we have an API that is complex to describe, and doesn't really do the same thing when called with a fd and a path.

This is a classic example of an API that should never have been implemented in node, it could perfectly reasonably be a npm module, or two... or three, one for every variant of thought on how it should work, but since its a node API, we have to state a definitive position, and possibly break backwards compat in the process.

I'm personally tempted to revert the entire "allow fd" feature, then let people debate how it should work with fds adequately before it gets merged in again. And explain why a feature to "read a file" should be able to read a pipe, or read any fd, really.

Same for writeFile. What does it even do when given a fd? The docs say only:

Asynchronously writes data to a file, replacing the file if it already exists.

Clearly, they weren't updated when the fd feature was added, I very much doubt that when passing a fd to writeFile, it replaces the file if it already exists, even if the fd was to a file, which it doesn't have to be.

So, at this point, what do we do?

I don't know. If the best we can do is doc how it works now, then so it goes.

@seishun Thanks for taking a shot at cleaning this up. I look forward to seeing what you can do. Consider my suggestion for reverting the "fd" feature, please, but if you go the doc route, look at writeFile, too, it doesn't document what it does with an fd (and may not test it, either).

Also, you mentioned positional reads on Windows above.

I happen to know that pread/pwrite don't change the fd position on Unix, and that fs.read/write with positions are implemented in their terms, but the docs for fs.read and fs.write say absolutely nothing about what happens to the current fd position, and in particular, don't say that what happens depends on Windows vs. Unix. So there is plenty to doc there, too. And possibly to test.

@jorangreef
Copy link
Contributor

@jorangreef's issue isn't specifically about fs.readFile

My issue #9671 was specifically about fs.readFile. When called with an fd it was not reading the whole file.

@seishun
Copy link
Contributor Author

seishun commented Jan 18, 2017

My issue #9671 was specifically about fs.readFile. When called with an fd it was not reading the whole file.

If it was specifically about fs.readFile then it would fail on Linux too. fs.readFile starts reading from the current file position on all platforms.

@addaleax
Copy link
Member

I'm personally tempted to revert the entire "allow fd" feature

I can certainly see the point behind “this should never have been introduced in the first place” but I doubt that removing it is feasible.

@jorangreef
Copy link
Contributor

@sam-github I agree with your summary of how we got here and the history of readFile() and writeFile(). I would like to add a few points to yours:

  1. readFile and writeFile were initially created for the purpose of working with regular files.
  2. This understanding may also be reflected in actual usage patterns. Do any npm packages use readFile to read special files?
  3. readFile(fd) and readFile(path) should always give the same result.

readFile(fd) is useful for working with long-lived files, where the open call can be amortized across the life of the program. For example, a key-value store might open several files at startup, then use read() and write() calls on the file at various positions, and then a compaction job might use readFile(fd) to read in the entire file, which should be read from offset 0.

I think readFile(fd) for regular files is worth keeping, since such a program would otherwise have to implement the same itself, and the implementation has proved difficult enough. Perhaps we are nearly there now.

If anything should be deprecated, I think it should be readFile(fd) for special files. Alternatively, this could be special-cased.

@jorangreef
Copy link
Contributor

jorangreef commented Jan 18, 2017

If it was specifically about fs.readFile then it would fail on Linux too. fs.readFile starts reading from the current file position on all platforms.

Yes, I have an issue with that. Sure, I discovered it when working with sparse files on Windows. Would it have made a difference if I found the problem when working with sparse files on Linux? The issue is with readFile specifically not reading the whole file, regardless of platform. Or what did you think my issue was about?

@seishun
Copy link
Contributor Author

seishun commented Jan 18, 2017

readFile and writeFile were initially created for the purpose of working with regular files.

readFile('/dev/stdin') has always worked afaik. fd vs path has nothing to do with regularness.

Yes, I have an issue with that. Sure, I discovered it when working with sparse files on Windows. Would it have made a difference if I found the problem when working with sparse files on Linux? The issue is with readFile specifically not reading the whole file, regardless of platform.

Perhaps you would like to update the issue description then. As it stands now, the issue in fs.readFile wouldn't manifest in your test case if fs.read/write worked correctly on Windows.

@jorangreef
Copy link
Contributor

fd vs path has nothing to do with regularness.

I was referring to a regular file (which can be accessed by fd or path) vs a special file (stdin, socket etc.).

Perhaps you would like to update the issue description then.

If it helps, sure.

As it stands now, the issue in fs.readFile wouldn't manifest in your test case if fs.read/write worked correctly on Windows.

That's besides the point. My test case showed that readFile does not read the whole file at least on Windows. I care that readFile reads the whole file on all platforms. The culprit may or may not be fs.read but that's an implementation detail. I do not mind too much how readFile works, so long as it does what it claims to do.

@seishun
Copy link
Contributor Author

seishun commented Jan 18, 2017

The culprit may or may not be fs.read but that's an implementation detail.

No, that's not an implementation detail, but a precondition in your test case. Your test case basically uses fs.read to read from an fd at a specific offset, then uses fs.readFile to read the rest of the file. It doesn't read the entire file on Windows because fs.read wrongly incremented the current file position.

If your issue indeed doesn't have anything to do with fs.read wrongly updating the current file position on Windows, then I suggest that you update the test case as well. Otherwise, someone might fix fs.read and close your issue since your test case would no longer reproduce the issue.

@jorangreef
Copy link
Contributor

It doesn't read the entire file on Windows because fs.read wrongly incremented the current file position.

fs.readFile still needs to work regardless of whether any previous independent fs.read calls updated the fd's seek position or not. Or would you say that fs.readFile is correct to assume that an fd's seek position is always 0?

@seishun
Copy link
Contributor Author

seishun commented Jan 18, 2017

fs.readFile still needs to work regardless of whether any previous independent fs.read calls updated the fd's seek position or not. Or would you say that fs.readFile is correct to assume that an fd's seek position is always 0?

It doesn't assume it's 0, it just starts reading from the current position. The docs are unclear on whether this behavior is correct, so this PR fixes the docs. If you think the behavior needs to be changed instead, then please take a look at the options I listed in #10809 (comment).

@jorangreef
Copy link
Contributor

It doesn't assume it's 0, it just starts reading from the current position.

That's an understanding of readFile contributed by this PR.

According to the present docs, readFile is supposed to "asynchronously read the entire contents of a file". If "it just starts reading from the current position" then readFile must be assuming that the current position is 0.

The docs are unclear on whether this behavior is correct

The docs are clear as to "asynchronously read the entire contents of a file". Where the docs are not clear is that readFile and writeFile were designed for regular files which can be seeked. Support for special files happened to work while readFile made assumptions about the fd's seek position.

If you think the behavior needs to be changed instead, then please take a look at the options I listed in #10809 (comment).

I would go for option 2, adding special casing for special files, or option 3, deprecating support for special files, pending a review of whether readFile(fd)/writeFile(fd) is in fact ever used for these.

This PR will likely not avert corruption issues for programs such as the key-value store use-case mentioned above, since readFile has been available for a while in Node and I would wager that most people expect readFile(path) and readFile(fd) to return the same data when working with regular files.

@seishun
Copy link
Contributor Author

seishun commented Jan 18, 2017

The docs are clear as to "asynchronously read the entire contents of a file". Where the docs are not clear is that readFile and writeFile were designed for regular files which can be seeked.

Well apparently opinions differ, see for instance @addaleax's comment in #10809 (comment).

I would go for option 2, adding special casing for special files, or option 3, deprecating support for special files, pending a review of whether readFile(fd)/writeFile(fd) is in fact ever used for these.

I'm not a huge fan of either of these options. Option 2 would make the behavior inconsistent between seekable and non-seekable files. Option 3 would introduce an arbitrary limitation by making fs.readFileSync(path) no longer equivalent to fs.readFileSync(fs.openSync(path, 'r')) for non-seekable files.

@jorangreef
Copy link
Contributor

Option 2 would make the behavior inconsistent between seekable and non-seekable files.

Not really, Option 2 would read from the beginning of a seekable file, and read from the current position in a non-seekable file. That would match user expectations in both cases.

@seishun
Copy link
Contributor Author

seishun commented Jan 18, 2017

That would match user expectations in both cases.

Are you sure this matches user expectations? As far as I know, yours was the first issue that brought it up, and you'd never have noticed it if you used Linux. Which shows that either most people are fine with the current behavior, or no one really uses fs.readFile like that.

@jorangreef
Copy link
Contributor

jorangreef commented Jan 18, 2017

Are you sure this matches user expectations?

Of course. Or do you think reading half a regular file would match user expectations better?

@sam-github
Copy link
Contributor

you'd never have noticed it if you used Linux

Why do you say that? Problem is clearly reproduceable on Linux:

% node -e 'fs=require("fs");fd=fs.openSync(".mailmap","r");console.log(fs.readFileSync(fd).length); console.log(fs.readFileSync(fd).length);'
9052
0

The file was read the first time. The file was not read completely the second time, clearly in contradiction of the docs, and I'm having a hard time docing this as "expected". Sometimes we find old node APIs, that have done something for a long time that isn't what we would want if we could do it over, and we have to bite the bullet and just document the behaviour. This isn't one of those cases, IMO, its a new API, and its not very common to call readFile with a fd. Allowing fds is not well justified for this API, as I've stated, but one possible justification is because you do intend to call readFile repeatedly, and don't want to pay the open price every time. This use pattern is currently not supported.

I agree with #10853 (comment)

At this point, reading entire files from position 0 for seekable fds, and doing best we can, reading from current position for unseekable fds is least surprising.

The API is documented and called "readFile"/"read entire file", not "readRemaing"/"read rest of file".

Under what conditions, given a seekable fd, would it be useful to read from current position to end of file?

@seishun
Copy link
Contributor Author

seishun commented Jan 18, 2017

Why do you say that? Problem is clearly reproduceable on Linux:

His test case in #9671 wasn't reproducible on Linux.

Under what conditions, given a seekable fd, would it be useful to read from current position to end of file?

Okay, maybe you're right, but I'd like some other people to chime in. @addaleax and @cjihrig approved this PR so perhaps they have some arguments against your proposal.

I assume you want to basically do the same as in 4444e73, but fall back to the old behavior with -1 if the binding.read call fails?

@sam-github
Copy link
Contributor

I liked 4444e73, that's why I approved it :-). I gather the problem is it adds a requirement that the fd be seekable/be preadable?

If it fails on unseekable streams, I'm OK with that. What does it even mean to "Asynchronously reads the entire contents of a file." on an unseekable fd?

But if we decide to define "entire contents" as "everything we can get" for unseekable streams, I'm OK with that, too, as long as we document it.

@addaleax
Copy link
Member

Support for special files happened to work while readFile made assumptions about the fd's seek position.

I find this whole concept of “support” for different types of files a bit hard to grasp. Maybe it’s my UNIX background speaking (I have no idea how Windows handles things, but I believe it’s kind of same?), but in my mind one Great Thing about file descriptors is that they are universal; you can use the same code on all kinds of files and you’ll get about the same result.

That’s also part of why I find the current behaviour to be the expected one; when I read the API description of readFile, I imagine its implementation as “read repeatedly until EOF is hit”, not “seek to the beginning of a file, then read until EOF is hit”, because the latter would not be universally applicable to all FDs.

Also, “Supporting” special files isn’t superfluous; I can definitely see applications using fs.readFile[Sync](0) as the easiest way to read, well, the entire content of stdin.

Option 2 would make the behavior inconsistent between seekable and non-seekable files.

Not really, Option 2 would read from the beginning of a seekable file, and read from the current position in a non-seekable file. That would match user expectations in both cases.

Even if it matches user expectations, your description of that behaviour does sound like the definition of an inconsistency to me ;)

Allowing fds is not well justified for this API, as I've stated, but one possible justification is because you do intend to call readFile repeatedly, and don't want to pay the open price every time.

That sounds more like a good reason to implement fs.seek?

Under what conditions, given a seekable fd, would it be useful to read from current position to end of file?

I’m not sure this is something that happens in The Real World a lot (then again nothing in this discussion happens a lot, I’d guess), but I can see an application reading the beginning of a file, to inspect a file header and the like, then deciding whether it wants to read the rest or not.

@@ -1347,7 +1347,8 @@ added: v0.1.29
* `flag` {String} default = `'r'`
* `callback` {Function}

Asynchronously reads the entire contents of a file. Example:
Asynchronously reads the entire contents of a file. If a file descriptor is
specified as the `file`, it will read from the current file position. Example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it more explicit:

If a file descriptor is specified as the `file`, reading will begin from the current
file position and not from the beginning of the file.

@sam-github
Copy link
Contributor

your description of that behaviour does sound like the definition of an inconsistency to me ;)

To me, too. There is no definition of this API that is consistent with path and fd, since using args that are so different do different things, but you were -1 on removing the API.

OK, I'm abstaining from voting on whether we just doc this or fix it. The pros and cons seem of equal weight.

This kind of API probilem is one more example of why node should be decreasing its API surface, though.

@seishun
Copy link
Contributor Author

seishun commented Jan 18, 2017

To me, too. There is no definition of this API that is consistent with path and fd, since using args that are so different do different things, but you were -1 on removing the API.

I don't think the difference between path and fd is the problem here. Imagine if readFile never accepted a path, only fd. We would have the same problem we do now.

OK, I'm abstaining from voting on whether we just doc this or fix it. The pros and cons seem of equal weight.

Me too. Should this be brought to CTC?

@sam-github
Copy link
Contributor

If readFile only took a fd, it wouldn't have been called readFile :-)

I don't have an opinion on whether it should hit the CTC. I'm OK with it going either way.

@nodejs/ctc

@jasnell
Copy link
Member

jasnell commented Mar 24, 2017

Updates on this one?

@jasnell jasnell added the stalled Issues and PRs that are stalled. label Mar 24, 2017
@seishun
Copy link
Contributor Author

seishun commented Mar 25, 2017

@jasnell Still unclear whether the current behavior should be considered a bug or a feature.

@fhinkel
Copy link
Member

fhinkel commented May 26, 2017

There hasn't been any activity here. I'm closing this. Feel free to reopen (or ping a collaborator) if closed in error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Issues and PRs related to the documentations. fs Issues and PRs related to the fs subsystem / file system. stalled Issues and PRs that are stalled.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants