Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sky storage error with gcs #263

Closed
Michaelvll opened this issue Jan 29, 2022 · 2 comments · Fixed by #306
Closed

sky storage error with gcs #263

Michaelvll opened this issue Jan 29, 2022 · 2 comments · Fixed by #306
Labels
bug Something isn't working P0

Comments

@Michaelvll
Copy link
Collaborator

Michaelvll commented Jan 29, 2022

On my local machine, the sky storage does not work with gcs. The following code will generate KeyError for project_id. I manually loaded the gcp_credential and it seems that there is no project_id in that dict.

storage = sky.Storage('test', source='/tmp/test')
storage.add_store(sky.StorageType.GCS)
Traceback (most recent call last):                                                               
  File "./prototype/sky/data/data_transfer.py", line 52, in s3_to_gcs                                             
    project_id = gcp_credentials['project_id']                                                   
KeyError: 'project_id'

Following are the package versions:

google-api-core                         2.2.2
google-api-python-client                2.32.0
google-auth                             2.3.3
google-auth-httplib2                    0.1.0
google-auth-oauthlib                    0.4.6
google-cloud-core                       2.2.1
google-cloud-storage                    1.43.0
google-crc32c                           1.3.0
google-resumable-media                  2.1.0
googleapis-common-protos                1.53.0
@Michaelvll Michaelvll added P0 bug Something isn't working labels Jan 29, 2022
@gmittal
Copy link
Collaborator

gmittal commented Feb 1, 2022

Probably related to #102? cc @michaelzhiluo @franklsf95

@concretevitamin
Copy link
Member

On latest master (4463bf):

>>> import sky
>>> storage = sky.Storage('test', source='/tmp/test')
>>> storage.add_store(sky.StorageType.GCS)
Traceback (most recent call last):
...
google.api_core.exceptions.Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/test?projection=noAcl&prettyPrint=false: zongheng@berkeley.edu does not have storage.buckets.get access to the Google Cloud Storage bucket.

which is good, because the project_id not found error is gone, and this is signaling gs://test cannot be written as it probably exists and is owned by another account.

However, using a new Storage name:

>>> import sky
>>> storage = sky.Storage('test21313147sahdh214h', source='/tmp/test')
>>> storage.add_store(sky.StorageType.GCS)
Traceback (most recent call last):
...
I 02-12 09:51:34 storage.py:375] 404 GET https://storage.googleapis.com/storage/v1/b/test21313147sahdh214h?projection=noAcl&prettyPrint=false: The specified bucket does not exist.
I 02-12 09:51:35 storage.py:409] Created GCS bucket test21313147sahdh214h in US-CENTRAL1             with storage class STANDARD
I 02-12 09:51:35 storage.py:335] Syncing Local to GCS

WARNING: gsutil rsync uses hashes when modification time is not available at
both the source and destination. Your crcmod installation isn't using the
module's C extension, so checksumming will run very slowly. If this is your
first rsync since updating gsutil, this rsync can take significantly longer than
usual. For help installing the extension, please see "gsutil help crcmod".

CommandException: arg (/tmp/test) does not name a directory, bucket, or bucket subdir.
If there is an object with the same path, please add a trailing
slash to specify the directory.
<sky.data.storage.GcsStore object at 0x7ffd63a6c490>

This has multiple problems

  • we should precheck a local path source exists; errors out if it doesn't
  • now, I have an empty bucket created on GCS (verified in console), which should not be created if the precheck doesn't pass
  • the above snippet didn't crash the program: statements continue to run, which is not good

A complete snippet

import sky
import time
storage = sky.Storage(f'test-{int(time.time())}', source='/tmp/test')
storage.add_store(sky.StorageType.GCS)
assert False, 'SHOULD NOT REACH'

@michaelzhiluo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants