Command/environment variable interaction is confusing and error-prone #28975

Diggsey · 2015-10-11T21:37:36Z

The API which Command exposes implies that environment variables are captured from the parent process at the point where it is constructed, and then may be subsequently modified through the use of env/env_remove/env_clear.

However, in reality the environment variables are captured lazily, when the first environment modifier is applied to the command. There's also no way to explicitly trigger the environment capture without actually modifying the environment.

This results in unpredictable behaviour if environment variables are modified between Command construction, and Command execution. Perhaps this is done as an optimization. However, spawning a new process far outweighs the cost of copying a few strings.

Environment variables are global state: it's a bad idea for accesses to global state to happen at an unpredictable time.

Finally, regardless of whether the behaviour is changed, it should be clearly documented what the behaviour is.

The text was updated successfully, but these errors were encountered:

steveklabnik · 2017-03-08T17:59:47Z

Tagging with the libs team here, I don't know if they want to make a change or not.

dtolnay · 2017-11-16T06:48:17Z

I agree, the current behavior seems broken to me. Would you expect the environment to be captured when new is called or when spawn is called? I believe #33183 wanted the latter, with the sequence of env / env_remove / env_clear changes replayed against whatever the parent process environment contains at that time. @oconnor663

My only concern would be that if we provide getters on the Command builder as requested in #44434, the user may call a getter to get the child environment to validate, which would need to dynamically compute the environment based on the current parent process environment.

let mut cmd = Command::new("...");
fill_in_the_environment(&mut cmd);

let child_env = cmd.get_the_environment();
validate_the_environment(child_env)?;

/* Environment is global state, suppose it changes in between. Now the environment
   we validated is no longer the one that the process will run with. */

cmd.spawn()?;

oconnor663 · 2017-11-16T16:33:31Z

@dtolnay yes, personally I'd prefer capturing it during spawn. A few thoughts:

Most callers probably don't modify the child's environment at all. In that case, spawn wouldn't actually need to do anything, as the child would inherit the parent's environment by default. Capturing the environment in new on the other hand, would need to unconditionally copy it for every caller.
- A counterpoint might be that, if we capture in spawn and the caller is using env functions, every call to spawn on the same Command instance will need to repeat the work to construct the final child environment. Probably at the end of the day, though, performance probably doesn't really matter on either side :)
My opinion is that capturing in spawn is "less surprising" than capturing in new would be, since it's the same behavior a simple C program would have using fork + execv. I like that there's just "less state in the world".
If we capture the current environment in new, we might also ask whether we're supposed to capture the current working directory in new. Probably no one wants to do that?
- Or the current stdin/stdout/stderr pipes? Or even more obscure and platform-specific global state?
- We might also think about the corner case where the caller sets some environment variables in before_exec. In that case those would be reflected in the child, even if the parent environment was otherwise captured in new. Though maybe people who do interesting things in before_exec are already signing up for trouble?
Whether we decide to capture in new or capture in spawn, either is a potentially breaking change. But switching to spawn is maybe less risky; it can only affect callers who are already using the env methods (because the current capture is lazy).

dtolnay · 2017-11-19T06:58:27Z

I would be prepared to consider a PR that delays capturing environment variables from the parent process until spawn.

Diggsey · 2017-12-16T17:33:08Z

@dtolnay I started implementing this, and found the reason for the odd behaviour: at the moment, the linux implementation provides the method exec() on CommandExt, and there is a comment indicating that this function should not allocate:

    // Currently we try hard to ensure that the call to `.exec()` doesn't
    // actually allocate any memory. While many platforms try to ensure that
    // memory allocation works after a fork in a multithreaded process, it's
    // been observed to be buggy and somewhat unreliable, so we do our best to
    // just not do it at all!
    //
    // Along those lines, the `argv` and `envp` raw pointers here are exactly
    // what's gonna get passed to `execvp`. The `argv` array starts with the
    // `program` and ends with a NULL, and the `envp` pointer, if present, is
    // also null-terminated.
    //
    // Right now we don't support removing arguments, so there's no much fancy
    // support there, but we support adding and removing environment variables,
    // so a side table is used to track where in the `envp` array each key is
    // located. Whenever we add a key we update it in place if it's already
    // present, and whenever we remove a key we update the locations of all
    // other keys.

I don't see a way to retain this (undocumented) guarantee if environment variables are captured at the point of spawn.

However, my suspicion is that the comment may be referring to do_exec() rather than exec() - in that case it would be fine for exec() to allocate as only do_exec() is called after a fork. I'm going to implement the changes under this assumption.

oconnor663 · 2017-12-18T15:21:39Z

@Diggsey I interpreted that comment to mean that no allocations should happen after fork is called. But spawn is able to do as much work as it wants before it eventually calls fork, so I think we can still do what we want?

When I was playing with this, I was surprised that the implementations for env were completely duplicated across the different platform implementations, rather than having some sort of cross-platform HashMap<OsString, OsString> in the Command object itself. If I get back to this, the first thing I'll want to try is to putting more responsibility for args and env storage into Command, and making the platform-specific code deal only in temporaries. (Would that use an excessive amount of extra memory, to make copies? I'm not sure.)

Diggsey · 2017-12-18T15:25:15Z

@oconnor0 That's basically what I implemented in #46789

Capture `Command` environment at spawn Fixes #28975 This tracks a set of changes to the environment and then replays them at spawn time.

steveklabnik added the A-libs label Oct 11, 2015

alexcrichton mentioned this issue Apr 25, 2016

Command::env()/env_remove() freeze the parent's environment #33183

Closed

steveklabnik added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 8, 2017

steveklabnik removed the A-libs label Mar 24, 2017

Mark-Simulacrum added the C-feature-request Category: A feature request, i.e: not implemented / a PR. label Jul 24, 2017

dtolnay added C-feature-accepted Category: A feature request that has been accepted pending implementation. and removed C-feature-request Category: A feature request, i.e: not implemented / a PR. labels Nov 19, 2017

Diggsey mentioned this issue Dec 17, 2017

Capture Command environment at spawn #46789

Merged

bors added a commit that referenced this issue Dec 23, 2017

Auto merge of #46789 - Diggsey:command-env-capture, r=dtolnay

faecf24

Capture `Command` environment at spawn Fixes #28975 This tracks a set of changes to the environment and then replays them at spawn time.

bors added a commit that referenced this issue Dec 24, 2017

Auto merge of #46789 - Diggsey:command-env-capture, r=dtolnay

c284f88

Capture `Command` environment at spawn Fixes #28975 This tracks a set of changes to the environment and then replays them at spawn time.

bors closed this as completed in #46789 Dec 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command/environment variable interaction is confusing and error-prone #28975

Command/environment variable interaction is confusing and error-prone #28975

Diggsey commented Oct 11, 2015

steveklabnik commented Mar 8, 2017

dtolnay commented Nov 16, 2017

oconnor663 commented Nov 16, 2017 •

edited

Loading

dtolnay commented Nov 19, 2017

Diggsey commented Dec 16, 2017

oconnor663 commented Dec 18, 2017

Diggsey commented Dec 18, 2017

Command/environment variable interaction is confusing and error-prone #28975

Command/environment variable interaction is confusing and error-prone #28975

Comments

Diggsey commented Oct 11, 2015

steveklabnik commented Mar 8, 2017

dtolnay commented Nov 16, 2017

oconnor663 commented Nov 16, 2017 • edited Loading

dtolnay commented Nov 19, 2017

Diggsey commented Dec 16, 2017

oconnor663 commented Dec 18, 2017

Diggsey commented Dec 18, 2017

oconnor663 commented Nov 16, 2017 •

edited

Loading