Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Admin policy enforcement plugin #3966

Merged
merged 67 commits into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
cb28b8d
support policy hook
Michaelvll Sep 19, 2024
b64efa0
test task labels
Michaelvll Sep 19, 2024
cf89929
Add test for policy that sets labels
Michaelvll Sep 20, 2024
54c93ea
Fix comment
Michaelvll Sep 20, 2024
1d1c500
format
Michaelvll Sep 20, 2024
a0bdb2c
use -e to make test related files visible
Michaelvll Sep 20, 2024
543e66a
Add config.rst
Michaelvll Sep 20, 2024
520a2a1
Fix test
Michaelvll Sep 20, 2024
b533351
fix config rst
Michaelvll Sep 20, 2024
466f7fe
Apply policy to service
Michaelvll Sep 20, 2024
050dc7a
add policy for serving
Michaelvll Sep 20, 2024
31e0174
Add docs
Michaelvll Sep 20, 2024
0c74f2a
fix
Michaelvll Sep 20, 2024
48a6cc9
format
Michaelvll Sep 20, 2024
1ca5a8a
Update interface
Michaelvll Sep 20, 2024
14b2346
fix
Michaelvll Sep 21, 2024
cb39c73
Fix
Michaelvll Sep 21, 2024
1e3ddef
fix
Michaelvll Sep 21, 2024
aa87df7
Fix test config
Michaelvll Sep 21, 2024
28487a4
Fix mutated config
Michaelvll Sep 21, 2024
d1f0480
fix
Michaelvll Sep 21, 2024
f42ace5
Add policy doc
Michaelvll Sep 21, 2024
c04f3dc
rename
Michaelvll Sep 21, 2024
58f413c
minor
Michaelvll Sep 21, 2024
52053bd
Add additional arguments for autostop
Michaelvll Sep 21, 2024
4a4f682
fix mypy
Michaelvll Sep 21, 2024
a8d1c44
format
Michaelvll Sep 22, 2024
6c73d81
rejected message
Michaelvll Sep 22, 2024
247c0b8
format
Michaelvll Sep 22, 2024
f8a5a64
Update sky/utils/policy_utils.py
Michaelvll Sep 22, 2024
73a4581
Update sky/utils/policy_utils.py
Michaelvll Sep 22, 2024
d78a822
Fix
Michaelvll Sep 22, 2024
8cc963c
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 22, 2024
68275f6
Update examples/admin_policy/example_policy/example_policy/__init__.py
Michaelvll Sep 22, 2024
9644622
Update docs/source/reference/config.rst
Michaelvll Sep 22, 2024
17f8fa1
Address comments
Michaelvll Sep 22, 2024
07c4748
format
Michaelvll Sep 22, 2024
15f1062
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 22, 2024
994272b
changes in examples
Michaelvll Sep 22, 2024
3597dae
Fix enforce autostop
Michaelvll Sep 22, 2024
43a6088
Fix autostop enforcement
Michaelvll Sep 22, 2024
8770d0b
fix test
Michaelvll Sep 22, 2024
7984beb
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
d155d60
Update sky/admin_policy.py
Michaelvll Sep 23, 2024
6ffa5ae
Update sky/admin_policy.py
Michaelvll Sep 23, 2024
a6dd900
wip
Michaelvll Sep 23, 2024
4274287
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
0609482
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
67552d7
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
7de757e
fix
Michaelvll Sep 23, 2024
8443ddc
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 23, 2024
7fbc30d
fix
Michaelvll Sep 23, 2024
92b68fc
fix
Michaelvll Sep 23, 2024
7d8af9a
Use sky.status for autostop
Michaelvll Sep 23, 2024
5b37f47
update policy
Michaelvll Sep 23, 2024
c7af310
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
cb232a8
fix policy.rst
Michaelvll Sep 23, 2024
5e9f544
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 23, 2024
deb4c92
Add comment
Michaelvll Sep 23, 2024
cbff59d
Fix logging
Michaelvll Sep 23, 2024
1fe350a
fix CI
Michaelvll Sep 23, 2024
2e8e41c
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
aae42ce
Use sphnix inline code
Michaelvll Sep 23, 2024
73c8fb7
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 23, 2024
11bbd5e
Add comment
Michaelvll Sep 23, 2024
3630535
fix skypilot config file mounts for jobs and serve
Michaelvll Sep 23, 2024
e020dea
Merge branch 'master' of github.com:skypilot-org/skypilot into policy…
Michaelvll Sep 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 18 additions & 9 deletions docs/source/cloud-setup/policy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,22 @@ Admin Policy Enforcement
========================


SkyPilot allows admins to enforce policies on users' SkyPilot usage by applying
custom validation and mutation logic on user's task and SkyPilot config.
SkyPilot provides an **admin policy** mechanism that admins can use to enforce certain policies on users' SkyPilot usage. An admin policy applies
custom validation and mutation logic to a user's tasks and SkyPilot config.

Example usage:

- Adds custom labels to all tasks [Link to below, fix case]
- Always Disable Public IP for AWS Tasks [Link to below]
- Enforce Autostop for all Tasks [Link to below]
Michaelvll marked this conversation as resolved.
Show resolved Hide resolved


To implement and use an admin policy:

- Admins writes a simple Python package with a policy class that implements SkyPilot's ``sky.AdminPolicy`` interface;
- Admins distributes this package to users;
Michaelvll marked this conversation as resolved.
Show resolved Hide resolved
- Users simply set the ``admin_policy`` field in the SkyPilot config file ``~/.sky/config.yaml`` for the policy to go into effect.

In short, admins offers a Python package with a customized inheritance of SkyPilot's
``AdminPolicy`` interface, and a user just needs to set the ``admin_policy`` field in
the SkyPilot config ``~/.sky/config.yaml`` to enforce the policy to all their
tasks.

Overview
--------
Expand All @@ -32,7 +41,7 @@ For example:
.. hint::

SkyPilot loads the policy from the given package in the same Python environment.
You can test the existance of the policy by running:
You can test the existence of the policy by running:

.. code-block:: bash

Expand All @@ -42,7 +51,7 @@ For example:
Admin-Side
~~~~~~~~~~

An admin can distribute the Python package to users with pre-defined policy. The
An admin can distribute the Python package to users with a pre-defined policy. The
policy should follow the following interface:
Michaelvll marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: python
Expand All @@ -52,7 +61,7 @@ policy should follow the following interface:
class MyPolicy(sky.AdminPolicy):
@classmethod
def validate_and_mutate(cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
# Logics for validate and modify user requests.
# Logic for validate and modify user requests.
...
return sky.MutatedUserRequest(user_request.task,
user_request.skypilot_config)
Expand Down
3 changes: 2 additions & 1 deletion docs/source/reference/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,11 @@ Available fields and semantics:
# Default: false.
disable_ecc: false

# Custom policy to be applied to all tasks. (optional).
# Admin policy to be applied to all tasks. (optional).
#
# The policy class to be applied to all tasks, which can be used to validate
# and mutate user requests.
#
# This is useful for enforcing certain policies on all tasks, e.g.,
# add custom labels; enforce certain resource limits; etc.
#
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,13 +64,25 @@ def validate_and_mutate(
return sky.MutatedUserRequest(
task=user_request.task,
skypilot_config=user_request.skypilot_config)
cluster_record = sky.status(request_options.cluster_name, refresh=True)
need_autostop = False
if not cluster_record:
# Cluster does not exist
need_autostop = True
elif cluster_record[0]['status'] == sky.ClusterStatus.STOPPED:
# Cluster is stopped
need_autostop = True
elif cluster_record[0]['autostop'] < 0:
# Cluster is running but autostop is not set
need_autostop = True

is_setting_autostop = False
idle_minutes_to_autostop = request_options.idle_minutes_to_autostop
# Enforce autostop/down to be set for all tasks for new clusters.
if not request_options.cluster_running and (
idle_minutes_to_autostop is None or
idle_minutes_to_autostop < 0):
raise RuntimeError('Autostop/down must be set for all newly '
'launched clusters.')
is_setting_autostop = (idle_minutes_to_autostop is not None and
idle_minutes_to_autostop >= 0)
if need_autostop and not is_setting_autostop:
raise RuntimeError('Autostop/down must be set for all clusters.')

return sky.MutatedUserRequest(
task=user_request.task,
skypilot_config=user_request.skypilot_config)
6 changes: 3 additions & 3 deletions sky/admin_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ class RequestOptions:
cluster_running: Whether the cluster is running.
idle_minutes_to_autostop: If provided, the cluster will be set to
autostop after this many minutes of idleness.
down: Whether to down the cluster.
dryrun: Whether to dryrun the request.
down: If true, use autodown rather than autostop.
dryrun: Is the request a dryrun?
"""
cluster_name: Optional[str]
cluster_running: bool
Expand Down Expand Up @@ -68,7 +68,7 @@ def validate_and_mutate(user_request: UserRequest) -> MutatedUserRequest:
...
return MutatedUserRequest(task=..., skypilot_config=...)

The policy can mutate both task and skypilot_config.
The policy can mutate both task and skypilot_config. Admins then distribute a simple module that contains this implementation, installable in a way that it can be imported by users from the same Python environment where SkyPilot is running.

Users can register a subclass of AdminPolicy in the SkyPilot config file
under the key 'admin_policy', e.g.
Expand Down
2 changes: 1 addition & 1 deletion sky/execution.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ def _execute(
dag = dag_utils.convert_entrypoint_to_dag(entrypoint)
dag, _ = admin_policy_utils.apply(
dag,
operation_args=admin_policy.RequestOptions(
request_options=admin_policy.RequestOptions(
cluster_name=cluster_name,
cluster_running=cluster_running,
idle_minutes_to_autostop=idle_minutes_to_autostop,
Expand Down
2 changes: 1 addition & 1 deletion sky/jobs/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def launch(

dag = dag_utils.convert_entrypoint_to_dag(entrypoint)
dag, mutated_user_config = admin_policy_utils.apply(
dag, update_skypilot_config_for_current_request=False)
dag, use_mutated_config_in_current_request=False)
if not dag.is_chain():
with ux_utils.print_exception_no_traceback():
raise ValueError('Only single-task or chain DAG is '
Expand Down
2 changes: 1 addition & 1 deletion sky/serve/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ def up(
_validate_service_task(task)

dag, mutated_user_config = admin_policy_utils.apply(
task, update_skypilot_config_for_current_request=False)
task, use_mutated_config_in_current_request=False)
task = dag.tasks[0]

controller_utils.maybe_translate_local_file_mounts_and_sync_up(task,
Expand Down
18 changes: 9 additions & 9 deletions sky/utils/admin_policy_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,20 +53,20 @@ def _get_policy_cls(

def apply(
entrypoint: Union['dag_lib.Dag', 'task_lib.Task'],
update_skypilot_config_for_current_request: bool = True,
operation_args: Optional[admin_policy.RequestOptions] = None,
use_mutated_config_in_current_request: bool = True,
request_options: Optional[admin_policy.RequestOptions] = None,
) -> Tuple['dag_lib.Dag', skypilot_config.Config]:
"""Applies an admin policy (if registered) to a DAG or a task.

It mutates a Dag by applying any registered admin policy and also
potentially updates (controlled by `apply_skypilot_config`) the global
SkyPilot config if there is any changes made by the policy.
potentially updates (controlled by `use_mutated_config_in_current_request`)
the global SkyPilot config if there is any changes made by the policy.

Args:
dag: The dag to be mutated by the policy.
apply_skypilot_config: Whether to apply the skypilot config changes to
the global skypilot config.
operation_args: Additional arguments user passed in SkyPilot operations.
use_mutated_config_in_current_request: Whether to use the mutated
config in the current request.
request_options: Additional options user passed for the current request.

Returns:
- The new copy of dag after applying the policy
Expand All @@ -91,7 +91,7 @@ def apply(

mutated_config = None
for task in dag.tasks:
user_request = admin_policy.UserRequest(task, config, operation_args)
user_request = admin_policy.UserRequest(task, config, request_options)
try:
mutated_user_request = policy_cls.validate_and_mutate(user_request)
except Exception as e: # pylint: disable=broad-except
Expand Down Expand Up @@ -120,7 +120,7 @@ def apply(
mutated_dag.graph.add_edge(mutated_dag.tasks[u_idx],
mutated_dag.tasks[v_idx])

if (update_skypilot_config_for_current_request and
if (use_mutated_config_in_current_request and
original_config != mutated_config):
with tempfile.NamedTemporaryFile(
delete=False,
Expand Down
2 changes: 1 addition & 1 deletion tests/unit_tests/test_admin_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def _load_task_and_apply_policy(
task = sky.Task.from_yaml(os.path.join(POLICY_PATH, 'task.yaml'))
return admin_policy_utils.apply(
task,
operation_args=sky.admin_policy.RequestOptions(
request_options=sky.admin_policy.RequestOptions(
cluster_name='test',
cluster_running=False,
idle_minutes_to_autostop=idle_minutes_to_autostop,
Expand Down
10 changes: 3 additions & 7 deletions tests/unit_tests/test_backend_utils.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,14 @@
import os
import pathlib
from typing import Dict
from unittest.mock import Mock
from unittest.mock import patch

import pytest

from sky import clouds
from sky import skypilot_config
from sky.backends import backend_utils
from sky.resources import Resources
from sky.resources import resources_utils


@patch.object(skypilot_config, 'CONFIG_PATH',
'./tests/test_yamls/test_aws_config.yaml')
# Set env var to test config file.
@patch.object(skypilot_config, '_dict', None)
@patch.object(skypilot_config, '_loaded_config_path', None)
@patch('sky.clouds.service_catalog.instance_type_exists', return_value=True)
Expand All @@ -29,6 +24,7 @@
@patch('sky.utils.common_utils.fill_template')
def test_write_cluster_config_w_remote_identity(mock_fill_template,
*mocks) -> None:
os.environ['SKYPILOT_CONFIG'] = './tests/test_yamls/test_aws_config.yaml'
skypilot_config._try_load_config()

cloud = clouds.AWS()
Expand Down
Loading