Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sky init and cloud credentials for storage #102

Merged
merged 52 commits into from
Feb 11, 2022
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
3627ba1
sky init
franklsf95 Dec 9, 2021
6bf331d
fixes and lint
franklsf95 Dec 9, 2021
5722950
Create test_enabled_clouds.py
franklsf95 Dec 9, 2021
0055be5
Optimizer is now aware of enabled_clouds
franklsf95 Dec 10, 2021
9c0d9de
Fix pytest
franklsf95 Dec 10, 2021
716893f
Update registry.py
franklsf95 Dec 10, 2021
d32c8e8
Merge
michaelzhiluo Dec 10, 2021
cbb1ac7
Merge
michaelzhiluo Dec 10, 2021
0feca9c
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Dec 24, 2021
b0c98e1
Support GCS buckets
franklsf95 Dec 24, 2021
f2c28e3
Make GCS on GCP work
franklsf95 Dec 24, 2021
20bd57f
yapf behaves differently across versions...
franklsf95 Dec 24, 2021
12e7069
yapf pls
franklsf95 Dec 24, 2021
0d8c5f5
Fix Azure
franklsf95 Dec 29, 2021
d40de79
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Dec 29, 2021
e8035a3
tweak messages
franklsf95 Dec 29, 2021
8fe4f82
tweak
franklsf95 Dec 29, 2021
7e839b7
Apply hotfix from #127
franklsf95 Dec 29, 2021
6d25abd
Simple fixes
franklsf95 Dec 29, 2021
b43dfd5
Use monkeypatch
franklsf95 Dec 29, 2021
d190562
Address comments
franklsf95 Dec 29, 2021
7386753
get rid of Task.enabled_clouds
franklsf95 Dec 30, 2021
f13fc34
fix test
franklsf95 Dec 30, 2021
a4630b1
Always install aws and gcloud utils
franklsf95 Dec 30, 2021
122625b
Address comments
franklsf95 Dec 30, 2021
6c85f87
oops
franklsf95 Dec 30, 2021
81c9c4b
Revert "Always install aws and gcloud utils"
franklsf95 Dec 30, 2021
d468056
Refactor and trigger `sky init` automatically
franklsf95 Dec 30, 2021
90744db
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Dec 30, 2021
0f14428
Reverted to `is_directory`
franklsf95 Dec 30, 2021
b10dcbe
nits
franklsf95 Dec 30, 2021
ccbad66
better check
franklsf95 Dec 30, 2021
f124d3e
usability
franklsf95 Dec 30, 2021
096d0c8
fix tests
franklsf95 Dec 31, 2021
80546a9
nits
franklsf95 Dec 31, 2021
a0e78f6
Address latest comments
franklsf95 Dec 31, 2021
b7afe19
Update init.py
franklsf95 Dec 31, 2021
737c6f4
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Jan 6, 2022
7f8de10
Fix CLI messages
franklsf95 Jan 6, 2022
5828176
Sync credentials regardless of storage mounts
franklsf95 Jan 7, 2022
6df4934
Fix cpunode and more docs
franklsf95 Jan 7, 2022
c690de1
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Jan 7, 2022
15dcc9b
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Jan 9, 2022
9bbfd30
Apply changes to requirements
franklsf95 Jan 9, 2022
9319dd2
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Jan 10, 2022
a687e51
Update setup.py
franklsf95 Jan 10, 2022
5013f5b
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
franklsf95 Jan 12, 2022
285e59b
Fix tests
franklsf95 Jan 12, 2022
e46027f
Merge branch 'master' into sky-init
concretevitamin Feb 11, 2022
88f92df
Fixes
concretevitamin Feb 11, 2022
33fe806
Merge branch 'master' into sky-init
concretevitamin Feb 11, 2022
9a1d8ea
Add links, fix test
concretevitamin Feb 11, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion prototype/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ ray attach config/gcp.yml
ray down config/gcp.yml
```

**Azure**. Install the Azure CLI (`pip install azure-cli`) then login using `az login`. Set the subscription to use from the command line (`az account set -s <subscription_id>`) or by modifying the provider section of the Azure template (`config/azure.yml.j2`). Ray Autoscaler does not work with the latest version of `azure-cli`. Hotfix: `pip install azure-cli-core==2.22.0` (this will make Ray work but at the cost of making the `az` CLI tool unusable).
**Azure**. Install the Azure CLI (`pip install azure-cli==2.22.0`) then login using `az login`. Set the subscription to use from the command line (`az account set -s <subscription_id>`). Ray Autoscaler does not work with the latest version of `azure-cli` as of 1.9.1, hence the fixed Azure version.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ray Autoscaler does not work with the latest version of azure-cli as of 1.9.1, hence the fixed Azure version.

What is the meaning of this sentence?


## Open issues

Expand Down
2 changes: 1 addition & 1 deletion prototype/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ tabulate
docker

awscli==1.22.17
azure-cli
azure-cli==2.22.0
google-api-python-client
google-cloud-storage
12 changes: 6 additions & 6 deletions prototype/sky/backends/cloud_vm_ray_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -981,17 +981,17 @@ def sync_file_mounts(
else:
wrapped_dst = backend_utils.FileMountHelper.wrap_file_mount(dst)
storage = cloud_stores.get_storage_from_path(src)
if storage.is_directory(src):
sync = storage.make_sync_dir_command(source=src,
destination=wrapped_dst)
# It is a directory so make sure it exists.
mkdir_for_wrapped_dst = f'mkdir -p {wrapped_dst}'
else:
if storage.is_file(src):
sync = storage.make_sync_file_command(source=src,
destination=wrapped_dst)
# It is a file so make sure *its parent dir* exists.
mkdir_for_wrapped_dst = \
f'mkdir -p {os.path.dirname(wrapped_dst)}'
else:
sync = storage.make_sync_dir_command(source=src,
destination=wrapped_dst)
# It is a directory so make sure it exists.
mkdir_for_wrapped_dst = f'mkdir -p {wrapped_dst}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test a src that doesn't exist, which will hit this branch:

<dst>: gs://nonexist-bucket/nonexist

There probably will be some error; is it reasonable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, this will hit an error while doing gsutil ls; now this will hit an error while doing aws sync / gsutil rsync. I think this is okay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franklsf95 Did you actually run the requested test? I requested it because it does not look trivial.

It is not okay / equivalent to before, because with this change, we would have created zombie dirs from L994 mkdir_for_wrapped_dst = f'mkdir -p {wrapped_dst}' if the src does not exist.

download_target_commands = [
# Ensure sync can write to wrapped_dst (e.g., '/data/').
mkdir_for_wrapped_dst,
Expand Down
32 changes: 32 additions & 0 deletions prototype/sky/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -640,6 +640,38 @@ def tpunode(cluster: str, port_forward: Optional[List[int]], screen):
)


@cli.command()
def init():
"""Determines a set of clouds that Sky will use.

It checks access credentials for AWS, Azure and GCP. Sky jobs will only
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved
run in clouds that you have access to. After configuring access for a
cloud, rerun `sky init` to reflect the changes.
"""
click.echo('Sky will use the following clouds to run jobs. '
'To change this, configure cloud access credentials,'
' and rerun ' + click.style('sky init', bold=True) + '.\n')

enabled_clouds = []
for cloud in [sky.AWS(), sky.Azure(), sky.GCP()]:
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved
click.echo(f' Checking {cloud}...', nl=False)
ok, reason = cloud.check_credentials()
click.echo('\r', nl=False)
status_msg = 'enabled' if ok else 'disabled'
status_color = 'green' if ok else 'red'
click.echo(
' ' +
click.style(f'{cloud}: {status_msg}', fg=status_color, bold=True) +
' ' * 10)
if ok:
enabled_clouds.append(str(cloud))
else:
click.echo(f' Reason: {reason}')
click.echo()

global_user_state.set_enabled_clouds(enabled_clouds)


def main():
return cli()

Expand Down
96 changes: 35 additions & 61 deletions prototype/sky/cloud_stores.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,11 @@
TODO:
* Better interface.
* Better implementation (e.g., fsspec, smart_open, using each cloud's SDK).
The full-blown impl should handle authentication so each user's private
datasets can be accessed.
"""
import subprocess
import urllib.parse

import boto3
import botocore

from sky.backends import backend_utils
from sky.data import data_utils
Expand All @@ -21,11 +19,11 @@
class CloudStorage(object):
"""Interface for a cloud object store."""

def is_directory(self, url: str) -> bool:
"""Returns whether 'url' is a directory.
def is_file(self, url: str) -> bool:
"""Returns whether 'url' is a regular file.
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved

In cloud object stores, a "directory" refers to a regular object whose
name is a prefix of other objects.
Returning false means <url> is a directory, or does not exist. Useful
for deciding whether to use cp or sync/rsync to download.
"""
raise NotImplementedError

Expand All @@ -46,27 +44,17 @@ class S3CloudStorage(CloudStorage):
'pip install awscli',
]

def is_directory(self, url: str) -> bool:
"""Returns whether S3 'url' is a directory.

franklsf95 marked this conversation as resolved.
Show resolved Hide resolved
In cloud object stores, a "directory" refers to a regular object whose
name is a prefix of other objects.
"""
s3 = boto3.resource('s3')
def is_file(self, url: str) -> bool:
"""Returns whether 'url' is a regular file."""
bucket_name, path = data_utils.split_s3_path(url)
bucket = s3.Bucket(bucket_name)

num_objects = 0
for obj in bucket.objects.filter(Prefix=path):
num_objects += 1
if obj.key == path:
return False
# If there are more than 1 object in filter, then it is a directory
if num_objects == 3:
return True

# A directory with few or no items
return True
if len(path) == 0:
return False
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved
s3 = boto3.client('s3')
try:
s3.head_object(Bucket=bucket_name, Key=path)
return True
except botocore.errorfactory.ClientError:
return False

def make_sync_dir_command(self, source: str, destination: str) -> str:
"""Downloads using AWS CLI."""
Expand Down Expand Up @@ -97,55 +85,41 @@ class GcsCloudStorage(CloudStorage):
# multi-threaded download is nice, which frees us from implementing
# parellel workers on our end.
_GET_GSUTIL = [
# Skip if gsutil already exists.
'pushd /tmp &>/dev/null',
# Skip if /tmp/gsutil already exists.
'(test -f /tmp/gsutil/gsutil || (wget --quiet '
'https://storage.googleapis.com/pub/gsutil.tar.gz && '
'tar xzf gsutil.tar.gz))',
'(test -f ~/google-cloud-sdk/bin/gsutil || (wget --quiet '
'https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/'
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved
'google-cloud-sdk-367.0.0-linux-x86_64.tar.gz && '
'tar xzf google-cloud-sdk-367.0.0-linux-x86_64.tar.gz && '
'mv google-cloud-sdk ~/ && '
'~/google-cloud-sdk/install.sh -q ))',
'popd &>/dev/null',
]

_GSUTIL = '/tmp/gsutil/gsutil'
_GSUTIL = '~/google-cloud-sdk/bin/gsutil'

def is_directory(self, url: str) -> bool:
"""Returns whether 'url' is a directory.
def is_file(self, url: str) -> bool:
"""Returns whether 'url' is a regular file."""
command = ' && '.join(self._GET_GSUTIL)
backend_utils.run(command)

In cloud object stores, a "directory" refers to a regular object whose
name is a prefix of other objects.
"""
commands = list(self._GET_GSUTIL)
commands.append(f'{self._GSUTIL} ls -d {url}')
command = ' && '.join(commands)
p = backend_utils.run(command, stdout=subprocess.PIPE)
out = p.stdout.decode().strip()
# gsutil ls -d url
# --> url.rstrip('/') if url is not a directory
# --> url with an ending '/' if url is a directory
if not out.endswith('/'):
assert out == url.rstrip('/'), (out, url)
return False
url = url if url.endswith('/') else (url + '/')
assert out == url, (out, url)
return True
# https://cloud.google.com/storage/docs/gsutil/commands/stat
gsutil_cmd = f'{self._GSUTIL} -q stat {url}'
p = backend_utils.run(gsutil_cmd, check=False)
rc = p.returncode
assert rc in [0, 1], command
return rc == 0
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved

def make_sync_dir_command(self, source: str, destination: str) -> str:
"""Downloads a directory using gsutil.

Limitation: no authentication support; 'source' is assumed to in a
publicly accessible bucket.
"""
"""Downloads a directory using gsutil."""
download_via_gsutil = (
f'{self._GSUTIL} -m rsync -d -r {source} {destination}')
all_commands = list(self._GET_GSUTIL)
all_commands.append(download_via_gsutil)
return ' && '.join(all_commands)

def make_sync_file_command(self, source: str, destination: str) -> str:
"""Downloads a file using gsutil.

Limitation: no authentication support; 'source' is assumed to in a
publicly accessible bucket.
"""
"""Downloads a file using gsutil."""
download_via_gsutil = f'{self._GSUTIL} -m cp {source} {destination}'
all_commands = list(self._GET_GSUTIL)
all_commands.append(download_via_gsutil)
Expand Down
10 changes: 10 additions & 0 deletions prototype/sky/clouds/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,13 @@
'Region',
'Zone',
]

__CLOUD_DICT__ = {
'AWS': AWS,
'Azure': Azure,
'GCP': GCP,
}


def cloud_factory(name):
return __CLOUD_DICT__[name]
31 changes: 31 additions & 0 deletions prototype/sky/clouds/aws.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,19 @@
"""Amazon Web Services."""
import copy
import json
import os
import subprocess
from typing import Dict, Iterator, List, Optional, Tuple

from sky import clouds
from sky.clouds.service_catalog import aws_catalog


def _run_output(cmd):
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved
proc = subprocess.run(cmd, shell=True, check=True, stdout=subprocess.PIPE)
return proc.stdout.decode('ascii')


class AWS(clouds.Cloud):
"""Amazon Web Services."""

Expand Down Expand Up @@ -160,3 +167,27 @@ def _make(instance_type):
if instance_type is None:
return []
return _make(instance_type)

def check_credentials(self) -> Tuple[bool, Optional[str]]:
"""Checks if the user has access credentials to this cloud."""
try:
assert os.path.isfile(os.path.expanduser('~/.aws/credentials'))
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved
output = _run_output('aws configure list')
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved
except (AssertionError, subprocess.CalledProcessError):
return False, 'AWS CLI not installed properly.'
lines = output.split('\n')
if len(lines) < 2:
return False, 'AWS CLI output invalid.'
access_key_ok = False
secret_key_ok = False
for line in lines[2:]:
line = line.lstrip()
if line.startswith('access_key'):
if '<not set>' not in line:
access_key_ok = True
elif line.startswith('secret_key'):
if '<not set>' not in line:
secret_key_ok = True
if access_key_ok and secret_key_ok:
return True, None
return False, 'AWS credentials not set. Run `aws configure`.'
16 changes: 16 additions & 0 deletions prototype/sky/clouds/azure.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
"""Azure."""
import copy
import json
import subprocess
from typing import Dict, Iterator, List, Optional, Tuple

from sky import clouds
from sky.clouds.service_catalog import azure_catalog


def _run_output(cmd):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, move to backend_util.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any clues on how to fix this circular import? @franklsf95

How did Ray solve this type of issue?

proc = subprocess.run(cmd, shell=True, check=True, stdout=subprocess.PIPE)
return proc.stdout.decode('ascii')


class Azure(clouds.Cloud):
"""Azure."""

Expand Down Expand Up @@ -128,3 +134,13 @@ def _make(instance_type):
if instance_type is None:
return []
return _make(instance_type)

def check_credentials(self) -> Tuple[bool, Optional[str]]:
"""Checks if the user has access credentials to this cloud."""
try:
output = _run_output('az account show --output=json')
except subprocess.CalledProcessError:
return False, 'Azure CLI returned error.'
if output.startswith('{'):
return True, None
return False, 'Azure credentials not set. Run `az login`.'
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 4 additions & 0 deletions prototype/sky/clouds/cloud.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,7 @@ def get_feasible_launchable_resources(self, resources):
Launchable resources require a cloud and an instance type be assigned.
"""
raise NotImplementedError

def check_credentials(self) -> Tuple[bool, Optional[str]]:
"""Checks if the user has access credentials to this cloud."""
raise NotImplementedError
17 changes: 17 additions & 0 deletions prototype/sky/clouds/gcp.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
"""Google Cloud Platform."""
import copy
import json
import os
from typing import Dict, Iterator, List, Optional, Tuple

from google import auth

from sky import clouds


Expand Down Expand Up @@ -276,3 +279,17 @@ def get_accelerators_from_instance_type(
instance_type: str,
) -> Optional[Dict[str, int]]:
return None

def check_credentials(self) -> Tuple[bool, Optional[str]]:
"""Checks if the user has access credentials to this cloud."""
try:
for file in [
'~/.config/gcloud/access_tokens.db',
'~/.config/gcloud/credentials.db'
]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check the $GOOGLE_APPLICATION_CREDENTIALS and $GCLOUD_PROJECT in the environment, instead? It seems to me the google credential can be placed anywhere?

assert os.path.isfile(os.path.expanduser(file))
auth.default()
except (AssertionError, auth.exceptions.DefaultCredentialsError):
return False, ('GCP credentials not set.'
' Run `gcloud auth application-default login`.')
return True, None
5 changes: 5 additions & 0 deletions prototype/sky/data/storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,11 @@ def __init__(self,
# from existing ones
self.stores = {} if stores is None else stores

if 's3://' in self.source:
self.get_or_copy_to_s3()
elif 'gs://' in self.source:
self.get_or_copy_to_gcs()
franklsf95 marked this conversation as resolved.
Show resolved Hide resolved

def get_or_copy_to_s3(self):
"""Adds AWS S3 Store to Storage
"""
Expand Down
Loading