[Tools] In `tools/launch.py`, correctly pass all DGL client/server env vars if udf is a multi-command #3245

erickim555 · 2021-08-11T19:16:54Z

Description

This PR modifies tools/launch.py so that a "multi command" udf_command is properly supported.
Specifically, this ensures that the DGL client/server env vars are:
(1) Persisted throughout all commands within the "multi command" udf_command
(2) Don't pollute the environment after udf_command is complete

Details:

# suppose udf_command is a "multi command"
udf_command = "pwd && python some/trainer.py"

# Before this change, tools/launch.py would execute the udf on each cluster host as (simplified):
cmd_to_run = "DGL_ROLE=server DGL_NUM_SAMPLER=2 pwd && python some/trainer.py"

# This is problematic, because the env-vars only apply to the first `pwd` command, and DGL breaks in
# difficult to debug ways when env vars aren't set inside some/trainer.py

# This PR fixes things so that it becomes:
cmd_to_run = "(export DGL_ROLE=server DGL_NUM_SAMPLER=2; pwd && python some/trainer.py)"

# Now, we satisfy both properties (1) and (2)

This does not break any backwards compatibility: existing single-cmd udf_command should still work exactly as before.

Addresses Issue: #3160

Testing:

Unit tests pass:

(dgl_py3) ~/code/dgl (distlaunch_chainedudf_envvars) $ PYTHONPATH=. python tests/tools/test_launch.py
......
----------------------------------------------------------------------
Ran 6 tests in 0.000s

OK

Checklist

Please feel free to remove inapplicable items for your PR.

[x ] The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
[x ] Changes are complete (i.e. I finished coding on this PR)
[x ] All changes have test coverage
[x ] Code is well-documented
[x ] To the best of my knowledge, examples are either not affected by this change,
or have been fixed to be compatible with this change
[x ] Related issue is referred in this PR

Changes

dgl-bot · 2021-08-11T19:18:33Z

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch];
For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

VoVAllen

Looks perfect. Appreciate it! @zheng-da Do you want to take a look?

erickim555 · 2021-08-16T22:47:39Z

Hi @VoVAllen @zheng-da , how does this PR look? Is it good to be merged into master? Thanks!

…v vars if udf is a multi-command (#3245) * Correctly pass all DGL client/server env vars if udf is a multi-command * Refactor to use wrap_cmd_with_local_envvars() helper fn

Correctly pass all DGL client/server env vars if udf is a multi-command

ad689d5

Refactor to use wrap_cmd_with_local_envvars() helper fn

35c7af2

VoVAllen approved these changes Aug 12, 2021

View reviewed changes

erickim555 added 2 commits August 12, 2021 03:50

Merge branch 'master' into distlaunch_chainedudf_envvars

ff495ee

Merge branch 'master' into distlaunch_chainedudf_envvars

8628884

zheng-da merged commit ac01e88 into dmlc:master Aug 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tools] In `tools/launch.py`, correctly pass all DGL client/server env vars if udf is a multi-command #3245

[Tools] In `tools/launch.py`, correctly pass all DGL client/server env vars if udf is a multi-command #3245

erickim555 commented Aug 11, 2021

dgl-bot commented Aug 11, 2021

VoVAllen left a comment

erickim555 commented Aug 16, 2021

[Tools] In tools/launch.py, correctly pass all DGL client/server env vars if udf is a multi-command #3245

[Tools] In tools/launch.py, correctly pass all DGL client/server env vars if udf is a multi-command #3245

Conversation

erickim555 commented Aug 11, 2021

Description

Checklist

Changes

dgl-bot commented Aug 11, 2021

VoVAllen left a comment

Choose a reason for hiding this comment

erickim555 commented Aug 16, 2021

[Tools] In `tools/launch.py`, correctly pass all DGL client/server env vars if udf is a multi-command #3245

[Tools] In `tools/launch.py`, correctly pass all DGL client/server env vars if udf is a multi-command #3245