Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sky init and cloud credentials for storage #102

Merged
merged 52 commits into from
Feb 11, 2022
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
3627ba1
sky init
franklsf95 Dec 9, 2021
6bf331d
fixes and lint
franklsf95 Dec 9, 2021
5722950
Create test_enabled_clouds.py
franklsf95 Dec 9, 2021
0055be5
Optimizer is now aware of enabled_clouds
franklsf95 Dec 10, 2021
9c0d9de
Fix pytest
franklsf95 Dec 10, 2021
716893f
Update registry.py
franklsf95 Dec 10, 2021
d32c8e8
Merge
michaelzhiluo Dec 10, 2021
cbb1ac7
Merge
michaelzhiluo Dec 10, 2021
0feca9c
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Dec 24, 2021
b0c98e1
Support GCS buckets
franklsf95 Dec 24, 2021
f2c28e3
Make GCS on GCP work
franklsf95 Dec 24, 2021
20bd57f
yapf behaves differently across versions...
franklsf95 Dec 24, 2021
12e7069
yapf pls
franklsf95 Dec 24, 2021
0d8c5f5
Fix Azure
franklsf95 Dec 29, 2021
d40de79
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Dec 29, 2021
e8035a3
tweak messages
franklsf95 Dec 29, 2021
8fe4f82
tweak
franklsf95 Dec 29, 2021
7e839b7
Apply hotfix from #127
franklsf95 Dec 29, 2021
6d25abd
Simple fixes
franklsf95 Dec 29, 2021
b43dfd5
Use monkeypatch
franklsf95 Dec 29, 2021
d190562
Address comments
franklsf95 Dec 29, 2021
7386753
get rid of Task.enabled_clouds
franklsf95 Dec 30, 2021
f13fc34
fix test
franklsf95 Dec 30, 2021
a4630b1
Always install aws and gcloud utils
franklsf95 Dec 30, 2021
122625b
Address comments
franklsf95 Dec 30, 2021
6c85f87
oops
franklsf95 Dec 30, 2021
81c9c4b
Revert "Always install aws and gcloud utils"
franklsf95 Dec 30, 2021
d468056
Refactor and trigger `sky init` automatically
franklsf95 Dec 30, 2021
90744db
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Dec 30, 2021
0f14428
Reverted to `is_directory`
franklsf95 Dec 30, 2021
b10dcbe
nits
franklsf95 Dec 30, 2021
ccbad66
better check
franklsf95 Dec 30, 2021
f124d3e
usability
franklsf95 Dec 30, 2021
096d0c8
fix tests
franklsf95 Dec 31, 2021
80546a9
nits
franklsf95 Dec 31, 2021
a0e78f6
Address latest comments
franklsf95 Dec 31, 2021
b7afe19
Update init.py
franklsf95 Dec 31, 2021
737c6f4
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Jan 6, 2022
7f8de10
Fix CLI messages
franklsf95 Jan 6, 2022
5828176
Sync credentials regardless of storage mounts
franklsf95 Jan 7, 2022
6df4934
Fix cpunode and more docs
franklsf95 Jan 7, 2022
c690de1
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Jan 7, 2022
15dcc9b
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Jan 9, 2022
9bbfd30
Apply changes to requirements
franklsf95 Jan 9, 2022
9319dd2
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Jan 10, 2022
a687e51
Update setup.py
franklsf95 Jan 10, 2022
5013f5b
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Jan 12, 2022
285e59b
Fix tests
franklsf95 Jan 12, 2022
e46027f
Merge branch 'master' into sky-init
concretevitamin Feb 11, 2022
88f92df
Fixes
concretevitamin Feb 11, 2022
33fe806
Merge branch 'master' into sky-init
concretevitamin Feb 11, 2022
9a1d8ea
Add links, fix test
concretevitamin Feb 11, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion prototype/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ ray attach config/gcp.yml
ray down config/gcp.yml
```

**Azure**. Install the Azure CLI (`pip install azure-cli`) then login using `az login`. Set the subscription to use from the command line (`az account set -s <subscription_id>`) or by modifying the provider section of the Azure template (`config/azure.yml.j2`). Ray Autoscaler does not work with the latest version of `azure-cli`. Hotfix: `pip install azure-cli-core==2.22.0` (this will make Ray work but at the cost of making the `az` CLI tool unusable).
**Azure**. Install the Azure CLI (`pip install azure-cli==2.22.0`) then login using `az login`. Set the subscription to use from the command line (`az account set -s <subscription_id>`). Ray Autoscaler does not work with the latest version of `azure-cli` as of 1.9.1, hence the fixed Azure version.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ray Autoscaler does not work with the latest version of azure-cli as of 1.9.1, hence the fixed Azure version.

What is the meaning of this sentence?


## Open issues

Expand Down
2 changes: 1 addition & 1 deletion prototype/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ tabulate
docker

awscli==1.22.17
azure-cli
azure-cli==2.22.0
google-api-python-client
google-cloud-storage
2 changes: 0 additions & 2 deletions prototype/sky/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
from sky.execution import execute
from sky.resources import Resources
from sky.task import ParTask, Task
from sky.registry import fill_in_launchable_resources
from sky.optimizer import Optimizer, OptimizeTarget
from sky.data import Storage, StorageType

Expand All @@ -34,7 +33,6 @@
'Task',
'backends',
'execute',
'fill_in_launchable_resources',
'list_accelerators',
'__root_dir__',
'Storage',
Expand Down
13 changes: 13 additions & 0 deletions prototype/sky/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
import sky
from sky import backends
from sky import global_user_state
from sky import init as sky_init
from sky import task as task_lib
from sky.backends import backend as backend_lib
from sky.backends import backend_utils
Expand Down Expand Up @@ -208,6 +209,7 @@ def _create_and_ssh_into_node(
dag = sky.optimize(dag)
task = dag.tasks[0]
backend.register_info(dag=dag)
task.update_file_mounts(sky_init.get_cloud_credential_file_mounts())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this to task.py so it will be done independent of Sky being ran via yaml or python script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both yaml or python will go to execution.py, so that part is fine. This is indeed duplicated in execution.py and cli.py. I'm not sure where best to put it in task.py - it might also be surprising to user to, for example, automatically add the three file mounts when any task is constructed. It feels to me it belongs more with the execution/backend logic. Open to suggestions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just have it once in execution.py in that case? The cli.py one is duplicated right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea but cli.py is currently a separate code path than execution.py. @gmittal is this deliberately kept separate, or is there plan to merge this with execution.py?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our setup.py/click require an entry script for the CLI, which is what cli.py is doing right now. If we were to merge with execution.py there would be many things we could not support, such as passing dags to launch. I think it makes sense for cli.py to call functions in execution.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand we need cli.py as a CLI entry point. I think Michael's question is why we have a function here in cli.py that seems to do similar things as in execution.py:_execute(), and whether we can merge these two code paths.


handle = global_user_state.get_handle_from_cluster_name(cluster_name)
if handle is None or handle.head_ip is None:
Expand Down Expand Up @@ -793,6 +795,17 @@ def tpunode(cluster: str, port_forward: Optional[List[int]],
)


@cli.command()
def init():
"""Determines a set of clouds that Sky will use.

It checks access credentials for AWS, Azure and GCP. Sky tasks will only
run in clouds that you have access to. After configuring access for a
cloud, rerun `sky init` to reflect the changes.
"""
sky_init.init()
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved


def main():
return cli()

Expand Down
34 changes: 16 additions & 18 deletions prototype/sky/cloud_stores.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@
TODO:
* Better interface.
* Better implementation (e.g., fsspec, smart_open, using each cloud's SDK).
The full-blown impl should handle authentication so each user's private
datasets can be accessed.
"""
import subprocess
import urllib.parse
Expand Down Expand Up @@ -97,15 +95,18 @@ class GcsCloudStorage(CloudStorage):
# multi-threaded download is nice, which frees us from implementing
# parellel workers on our end.
_GET_GSUTIL = [
# Skip if gsutil already exists.
'pushd /tmp &>/dev/null',
# Skip if /tmp/gsutil already exists.
'(test -f /tmp/gsutil/gsutil || (wget --quiet '
'https://storage.googleapis.com/pub/gsutil.tar.gz && '
'tar xzf gsutil.tar.gz))',
'(test -f ~/google-cloud-sdk/bin/gsutil || (wget --quiet '
'https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/'
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved
'google-cloud-sdk-367.0.0-linux-x86_64.tar.gz && '
'tar xzf google-cloud-sdk-367.0.0-linux-x86_64.tar.gz && '
'mv google-cloud-sdk ~/ && '
'~/google-cloud-sdk/install.sh -q ))',
'popd &>/dev/null',
]

_GSUTIL = '/tmp/gsutil/gsutil'
_GSUTIL = '~/google-cloud-sdk/bin/gsutil'

def is_directory(self, url: str) -> bool:
"""Returns whether 'url' is a directory.
Expand All @@ -118,7 +119,12 @@ def is_directory(self, url: str) -> bool:
command = ' && '.join(commands)
p = backend_utils.run(command, stdout=subprocess.PIPE)
out = p.stdout.decode().strip()
# gsutil ls -d url
# If <url> is a bucket root, then we only need `gsutil` to succeed
# to make sure the bucket exists. It is already a directory.
_, key = data_utils.split_gcs_path(url)
if len(key) == 0:
return True
# Otherwise, gsutil ls -d url will return:
# --> url.rstrip('/') if url is not a directory
# --> url with an ending '/' if url is a directory
if not out.endswith('/'):
Expand All @@ -129,23 +135,15 @@ def is_directory(self, url: str) -> bool:
return True

def make_sync_dir_command(self, source: str, destination: str) -> str:
"""Downloads a directory using gsutil.

Limitation: no authentication support; 'source' is assumed to in a
publicly accessible bucket.
"""
"""Downloads a directory using gsutil."""
download_via_gsutil = (
f'{self._GSUTIL} -m rsync -d -r {source} {destination}')
all_commands = list(self._GET_GSUTIL)
all_commands.append(download_via_gsutil)
return ' && '.join(all_commands)

def make_sync_file_command(self, source: str, destination: str) -> str:
"""Downloads a file using gsutil.

Limitation: no authentication support; 'source' is assumed to in a
publicly accessible bucket.
"""
"""Downloads a file using gsutil."""
download_via_gsutil = f'{self._GSUTIL} -m cp {source} {destination}'
all_commands = list(self._GET_GSUTIL)
all_commands.append(download_via_gsutil)
Expand Down
14 changes: 14 additions & 0 deletions prototype/sky/clouds/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,24 @@
from sky.clouds.gcp import GCP

__all__ = [
'ALL_CLOUDS',
'AWS',
'Azure',
'Cloud',
'GCP',
'Region',
'Zone',
'from_str',
]

__CLOUD_DICT__ = {
'AWS': AWS(),
'Azure': Azure(),
'GCP': GCP(),
}

ALL_CLOUDS = list(__CLOUD_DICT__.values())


def from_str(name: str) -> 'Cloud':
return __CLOUD_DICT__[name]
49 changes: 49 additions & 0 deletions prototype/sky/clouds/aws.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,23 @@
"""Amazon Web Services."""
import copy
import json
import os
import subprocess
from typing import Dict, Iterator, List, Optional, Tuple

from sky import clouds
from sky.clouds.service_catalog import aws_catalog


def _run_output(cmd):
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved
proc = subprocess.run(cmd,
shell=True,
check=True,
stderr=subprocess.PIPE,
stdout=subprocess.PIPE)
return proc.stdout.decode('ascii')


class AWS(clouds.Cloud):
"""Amazon Web Services."""

Expand Down Expand Up @@ -176,3 +187,41 @@ def _make(instance_type):
if instance_type is None:
return []
return _make(instance_type)

def check_credentials(self) -> Tuple[bool, Optional[str]]:
"""Checks if the user has access credentials to this cloud."""
# This file is required because it will be synced to remote VMs for
# `aws` to access private storage buckets.
# `aws configure list` does not guarantee this file exists.
if not os.path.isfile(os.path.expanduser('~/.aws/credentials')):
return (False,
'~/.aws/credentials does not exist. Run `aws configure`.')
try:
output = _run_output('aws configure list')
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved
except subprocess.CalledProcessError:
return False, 'AWS CLI not installed properly.'
# Configured correctly, the AWS output should look like this:
# ...
# access_key ******************** shared-credentials-file
# secret_key ******************** shared-credentials-file
# ...
# Otherwise, one or both keys will show as '<not set>'.
lines = output.split('\n')
if len(lines) < 2:
return False, 'AWS CLI output invalid.'
access_key_ok = False
secret_key_ok = False
for line in lines[2:]:
line = line.lstrip()
if line.startswith('access_key'):
if '<not set>' not in line:
access_key_ok = True
elif line.startswith('secret_key'):
if '<not set>' not in line:
secret_key_ok = True
if access_key_ok and secret_key_ok:
return True, None
return False, 'AWS credentials not set. Run `aws configure`.'

def get_credential_file_mounts(self) -> Dict[str, str]:
return {'~/.aws': '~/.aws'}
35 changes: 35 additions & 0 deletions prototype/sky/clouds/azure.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,23 @@
"""Azure."""
import copy
import json
import os
import subprocess
from typing import Dict, Iterator, List, Optional, Tuple

from sky import clouds
from sky.clouds.service_catalog import azure_catalog


def _run_output(cmd):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, move to backend_util.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any clues on how to fix this circular import? @franklsf95

How did Ray solve this type of issue?

proc = subprocess.run(cmd,
shell=True,
check=True,
stderr=subprocess.PIPE,
stdout=subprocess.PIPE)
return proc.stdout.decode('ascii')


class Azure(clouds.Cloud):
"""Azure."""

Expand Down Expand Up @@ -141,3 +152,27 @@ def _make(instance_type):
if instance_type is None:
return []
return _make(instance_type)

def check_credentials(self) -> Tuple[bool, Optional[str]]:
"""Checks if the user has access credentials to this cloud."""
# This file is required because it will be synced to remote VMs for
# `az` to access private storage buckets.
# `az account show` does not guarantee this file exists.
if not os.path.isfile(os.path.expanduser('~/.azure/accessTokens.json')):
return (
False,
'~/.azure/accessTokens.json does not exist. Run `az login`.')
try:
output = _run_output('az account show --output=json')
except subprocess.CalledProcessError:
return False, 'Azure CLI returned error.'
# If Azure is properly logged in, this will return something like:
# {"id": ..., "user": ...}
# and if not, it will return:
# Please run 'az login' to setup account.
if output.startswith('{'):
return True, None
return False, 'Azure credentials not set. Run `az login`.'
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved

def get_credential_file_mounts(self) -> Dict[str, str]:
return {'~/.azure': '~/.azure'}
14 changes: 14 additions & 0 deletions prototype/sky/clouds/cloud.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,3 +125,17 @@ def get_feasible_launchable_resources(self, resources):
Launchable resources require a cloud and an instance type be assigned.
"""
raise NotImplementedError

def check_credentials(self) -> Tuple[bool, Optional[str]]:
"""Checks if the user has access credentials to this cloud.

Returns a boolean of whether the user can access this cloud, and a
string describing the reason if the user cannot access.
"""
raise NotImplementedError

def get_credential_file_mounts(self) -> Dict[str, str]:
"""Returns the files necessary to access this cloud.

Returns a dictionary that will be added to a task's file mounts."""
raise NotImplementedError
32 changes: 32 additions & 0 deletions prototype/sky/clouds/gcp.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
"""Google Cloud Platform."""
import copy
import json
import os
from typing import Dict, Iterator, List, Optional, Tuple

from google import auth

from sky import clouds


Expand Down Expand Up @@ -285,3 +288,32 @@ def get_accelerators_from_instance_type(
instance_type: str,
) -> Optional[Dict[str, int]]:
return None

def check_credentials(self) -> Tuple[bool, Optional[str]]:
"""Checks if the user has access credentials to this cloud."""
try:
# These files are required because they will be synced to remote
# VMs for `gsutil` to access private storage buckets.
# `auth.default()` does not guarantee these files exist.
for file in [
'~/.config/gcloud/access_tokens.db',
'~/.config/gcloud/credentials.db'
]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check the $GOOGLE_APPLICATION_CREDENTIALS and $GCLOUD_PROJECT in the environment, instead? It seems to me the google credential can be placed anywhere?

assert os.path.isfile(os.path.expanduser(file))
# Calling `auth.default()` ensures the GCP client library works,
# which is used by Ray Autoscaler to launch VMs.
auth.default()
except (AssertionError, auth.exceptions.DefaultCredentialsError):
return False, (
'GCP credentials not set. Run the following commands:\n '
# This authenticates the CLI to make `gsutil` work:
'$ gcloud auth login\n '
'$ gcloud config set project <proj>\n '
# These two commands setup the client library to make
# Ray Autoscaler work:
'$ gcloud auth application-default login\n '
'$ gcloud auth application-default set-quota-project <proj>')
return True, None

def get_credential_file_mounts(self) -> Dict[str, str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it will sync to the remote's home directory? Had a small bug last time where Ray autoscaler made a /~/ directory instead which was very confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About circular import: the solution would be to create another file e.g. run_utils.py, and have both backend_utils, aws, azure import run_output from it. Since this is only one small function, I fear that might be a bit unnecessary.

About syncing ~: In my tests they did sync to the user home directory on the VMs. I haven't run into the issue you described. using_file_mounts.yaml seems to also use this format. Is there an alternative way of writing this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sgtm, lets just leave the circular import for now and fix in a future PR.

Maybe try $HOME instead of ~ for aws, gcp, azure configuration file mounting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try $HOME.

return {'~/.config/gcloud': '~/.config/gcloud'}
3 changes: 3 additions & 0 deletions prototype/sky/execution.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

import sky
from sky import backends
from sky import init
from sky import logging
from sky import optimizer

Expand Down Expand Up @@ -86,6 +87,8 @@ def execute(dag: sky.Dag,
backend = backend if backend is not None else backends.CloudVmRayBackend()
backend.register_info(dag=dag, optimize_target=optimize_target)

task.update_file_mounts(init.get_cloud_credential_file_mounts())
michaelzhiluo marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the filemount enough? Or, should we add them to the environment as well?


if task.storage_mounts is not None:
# Optimizer should eventually choose where to store bucket
task.add_storage_mounts()
Expand Down
Loading