Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command/environment variable interaction is confusing and error-prone #28975

Closed
Diggsey opened this issue Oct 11, 2015 · 7 comments
Closed

Command/environment variable interaction is confusing and error-prone #28975

Diggsey opened this issue Oct 11, 2015 · 7 comments
Labels
C-feature-accepted Category: A feature request that has been accepted pending implementation. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Diggsey
Copy link
Contributor

Diggsey commented Oct 11, 2015

The API which Command exposes implies that environment variables are captured from the parent process at the point where it is constructed, and then may be subsequently modified through the use of env/env_remove/env_clear.

However, in reality the environment variables are captured lazily, when the first environment modifier is applied to the command. There's also no way to explicitly trigger the environment capture without actually modifying the environment.

This results in unpredictable behaviour if environment variables are modified between Command construction, and Command execution. Perhaps this is done as an optimization. However, spawning a new process far outweighs the cost of copying a few strings.

Environment variables are global state: it's a bad idea for accesses to global state to happen at an unpredictable time.

Finally, regardless of whether the behaviour is changed, it should be clearly documented what the behaviour is.

@steveklabnik steveklabnik added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 8, 2017
@steveklabnik
Copy link
Member

Tagging with the libs team here, I don't know if they want to make a change or not.

@Mark-Simulacrum Mark-Simulacrum added the C-feature-request Category: A feature request, i.e: not implemented / a PR. label Jul 24, 2017
@dtolnay
Copy link
Member

dtolnay commented Nov 16, 2017

I agree, the current behavior seems broken to me. Would you expect the environment to be captured when new is called or when spawn is called? I believe #33183 wanted the latter, with the sequence of env / env_remove / env_clear changes replayed against whatever the parent process environment contains at that time. @oconnor663

My only concern would be that if we provide getters on the Command builder as requested in #44434, the user may call a getter to get the child environment to validate, which would need to dynamically compute the environment based on the current parent process environment.

let mut cmd = Command::new("...");
fill_in_the_environment(&mut cmd);

let child_env = cmd.get_the_environment();
validate_the_environment(child_env)?;

/* Environment is global state, suppose it changes in between. Now the environment
   we validated is no longer the one that the process will run with. */

cmd.spawn()?;

@oconnor663
Copy link
Contributor

oconnor663 commented Nov 16, 2017

@dtolnay yes, personally I'd prefer capturing it during spawn. A few thoughts:

  • Most callers probably don't modify the child's environment at all. In that case, spawn wouldn't actually need to do anything, as the child would inherit the parent's environment by default. Capturing the environment in new on the other hand, would need to unconditionally copy it for every caller.
    • A counterpoint might be that, if we capture in spawn and the caller is using env functions, every call to spawn on the same Command instance will need to repeat the work to construct the final child environment. Probably at the end of the day, though, performance probably doesn't really matter on either side :)
  • My opinion is that capturing in spawn is "less surprising" than capturing in new would be, since it's the same behavior a simple C program would have using fork + execv. I like that there's just "less state in the world".
  • If we capture the current environment in new, we might also ask whether we're supposed to capture the current working directory in new. Probably no one wants to do that?
    • Or the current stdin/stdout/stderr pipes? Or even more obscure and platform-specific global state?
    • We might also think about the corner case where the caller sets some environment variables in before_exec. In that case those would be reflected in the child, even if the parent environment was otherwise captured in new. Though maybe people who do interesting things in before_exec are already signing up for trouble?
  • Whether we decide to capture in new or capture in spawn, either is a potentially breaking change. But switching to spawn is maybe less risky; it can only affect callers who are already using the env methods (because the current capture is lazy).

@dtolnay dtolnay added C-feature-accepted Category: A feature request that has been accepted pending implementation. and removed C-feature-request Category: A feature request, i.e: not implemented / a PR. labels Nov 19, 2017
@dtolnay
Copy link
Member

dtolnay commented Nov 19, 2017

I would be prepared to consider a PR that delays capturing environment variables from the parent process until spawn.

@Diggsey
Copy link
Contributor Author

Diggsey commented Dec 16, 2017

@dtolnay I started implementing this, and found the reason for the odd behaviour: at the moment, the linux implementation provides the method exec() on CommandExt, and there is a comment indicating that this function should not allocate:

    // Currently we try hard to ensure that the call to `.exec()` doesn't
    // actually allocate any memory. While many platforms try to ensure that
    // memory allocation works after a fork in a multithreaded process, it's
    // been observed to be buggy and somewhat unreliable, so we do our best to
    // just not do it at all!
    //
    // Along those lines, the `argv` and `envp` raw pointers here are exactly
    // what's gonna get passed to `execvp`. The `argv` array starts with the
    // `program` and ends with a NULL, and the `envp` pointer, if present, is
    // also null-terminated.
    //
    // Right now we don't support removing arguments, so there's no much fancy
    // support there, but we support adding and removing environment variables,
    // so a side table is used to track where in the `envp` array each key is
    // located. Whenever we add a key we update it in place if it's already
    // present, and whenever we remove a key we update the locations of all
    // other keys.

I don't see a way to retain this (undocumented) guarantee if environment variables are captured at the point of spawn.

However, my suspicion is that the comment may be referring to do_exec() rather than exec() - in that case it would be fine for exec() to allocate as only do_exec() is called after a fork. I'm going to implement the changes under this assumption.

@oconnor663
Copy link
Contributor

@Diggsey I interpreted that comment to mean that no allocations should happen after fork is called. But spawn is able to do as much work as it wants before it eventually calls fork, so I think we can still do what we want?

When I was playing with this, I was surprised that the implementations for env were completely duplicated across the different platform implementations, rather than having some sort of cross-platform HashMap<OsString, OsString> in the Command object itself. If I get back to this, the first thing I'll want to try is to putting more responsibility for args and env storage into Command, and making the platform-specific code deal only in temporaries. (Would that use an excessive amount of extra memory, to make copies? I'm not sure.)

@Diggsey
Copy link
Contributor Author

Diggsey commented Dec 18, 2017

@oconnor0 That's basically what I implemented in #46789

bors added a commit that referenced this issue Dec 23, 2017
Capture `Command` environment at spawn

Fixes #28975

This tracks a set of changes to the environment and then replays them at spawn time.
bors added a commit that referenced this issue Dec 24, 2017
Capture `Command` environment at spawn

Fixes #28975

This tracks a set of changes to the environment and then replays them at spawn time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-accepted Category: A feature request that has been accepted pending implementation. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants