Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tools] In tools/launch.py, correctly pass all DGL client/server env vars if udf is a multi-command #3245

Merged
merged 4 commits into from
Aug 17, 2021

Conversation

erickim555
Copy link
Contributor

Description

This PR modifies tools/launch.py so that a "multi command" udf_command is properly supported.
Specifically, this ensures that the DGL client/server env vars are:
(1) Persisted throughout all commands within the "multi command" udf_command
(2) Don't pollute the environment after udf_command is complete

Details:

# suppose udf_command is a "multi command"
udf_command = "pwd && python some/trainer.py"

# Before this change, tools/launch.py would execute the udf on each cluster host as (simplified):
cmd_to_run = "DGL_ROLE=server DGL_NUM_SAMPLER=2 pwd && python some/trainer.py"

# This is problematic, because the env-vars only apply to the first `pwd` command, and DGL breaks in
# difficult to debug ways when env vars aren't set inside some/trainer.py

# This PR fixes things so that it becomes:
cmd_to_run = "(export DGL_ROLE=server DGL_NUM_SAMPLER=2; pwd && python some/trainer.py)"

# Now, we satisfy both properties (1) and (2)

This does not break any backwards compatibility: existing single-cmd udf_command should still work exactly as before.

Addresses Issue: #3160

Testing:

Unit tests pass:

(dgl_py3) ~/code/dgl (distlaunch_chainedudf_envvars) $ PYTHONPATH=. python tests/tools/test_launch.py
......
----------------------------------------------------------------------
Ran 6 tests in 0.000s

OK

Checklist

Please feel free to remove inapplicable items for your PR.

  • [x ] The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • [x ] Changes are complete (i.e. I finished coding on this PR)
  • [x ] All changes have test coverage
  • [x ] Code is well-documented
  • [x ] To the best of my knowledge, examples are either not affected by this change,
    or have been fixed to be compatible with this change
  • [x ] Related issue is referred in this PR

Changes

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 11, 2021

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

Copy link
Collaborator

@VoVAllen VoVAllen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect. Appreciate it! @zheng-da Do you want to take a look?

@erickim555
Copy link
Contributor Author

Hi @VoVAllen @zheng-da , how does this PR look? Is it good to be merged into master? Thanks!

@zheng-da zheng-da merged commit ac01e88 into dmlc:master Aug 17, 2021
BarclayII pushed a commit that referenced this pull request Aug 26, 2021
…v vars if udf is a multi-command (#3245)

* Correctly pass all DGL client/server env vars if udf is a multi-command

* Refactor to use wrap_cmd_with_local_envvars() helper fn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants