Skip to content

Commit

Permalink
[Core] Admin policy enforcement plugin (#3966)
Browse files Browse the repository at this point in the history
* support policy hook

* test task labels

* Add test for policy that sets labels

* Fix comment

* format

* use -e to make test related files visible

* Add config.rst

* Fix test

* fix config rst

* Apply policy to service

* add policy for serving

* Add docs

* fix

* format

* Update interface

* fix

* Fix

* fix

* Fix test config

* Fix mutated config

* fix

* Add policy doc

* rename

* minor

* Add additional arguments for autostop

* fix mypy

* format

* rejected message

* format

* Update sky/utils/policy_utils.py

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* Update sky/utils/policy_utils.py

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* Fix

* Update examples/admin_policy/example_policy/example_policy/__init__.py

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* Update docs/source/reference/config.rst

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* Address comments

* format

* changes in examples

* Fix enforce autostop

* Fix autostop enforcement

* fix test

* Update docs/source/cloud-setup/policy.rst

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* Update sky/admin_policy.py

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* Update sky/admin_policy.py

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* wip

* Update docs/source/cloud-setup/policy.rst

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* Update docs/source/cloud-setup/policy.rst

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* Update docs/source/cloud-setup/policy.rst

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* fix

* fix

* fix

* Use sky.status for autostop

* update policy

* Update docs/source/cloud-setup/policy.rst

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* fix policy.rst

* Add comment

* Fix logging

* fix CI

* Update docs/source/cloud-setup/policy.rst

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* Use sphnix inline code

* Add comment

* fix skypilot config file mounts for jobs and serve

---------

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
  • Loading branch information
Michaelvll and concretevitamin committed Sep 24, 2024
1 parent 31c0a5c commit 800f7d6
Show file tree
Hide file tree
Showing 34 changed files with 1,024 additions and 139 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install ".[all]"
pip install -e ".[all]"
pip install pytest pytest-xdist pytest-env>=0.6 memory-profiler==0.61.0
- name: Run tests with pytest
Expand Down
195 changes: 195 additions & 0 deletions docs/source/cloud-setup/policy.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
.. _advanced-policy-config:

Admin Policy Enforcement
========================


SkyPilot provides an **admin policy** mechanism that admins can use to enforce certain policies on users' SkyPilot usage. An admin policy applies
custom validation and mutation logic to a user's tasks and SkyPilot config.

Example usage:

- :ref:`kubernetes-labels-policy`
- :ref:`disable-public-ip-policy`
- :ref:`use-spot-for-gpu-policy`
- :ref:`enforce-autostop-policy`


To implement and use an admin policy:

- Admins writes a simple Python package with a policy class that implements SkyPilot's ``sky.AdminPolicy`` interface;
- Admins distributes this package to users;
- Users simply set the ``admin_policy`` field in the SkyPilot config file ``~/.sky/config.yaml`` for the policy to go into effect.


Overview
--------



User-Side
~~~~~~~~~~

To apply the policy, a user needs to set the ``admin_policy`` field in the SkyPilot config
``~/.sky/config.yaml`` to the path of the Python package that implements the policy.
For example:

.. code-block:: yaml
admin_policy: mypackage.subpackage.MyPolicy
.. hint::

SkyPilot loads the policy from the given package in the same Python environment.
You can test the existence of the policy by running:

.. code-block:: bash
python -c "from mypackage.subpackage import MyPolicy"
Admin-Side
~~~~~~~~~~

An admin can distribute the Python package to users with a pre-defined policy. The
policy should implement the ``sky.AdminPolicy`` `interface <https://github.com/skypilot-org/skypilot/blob/master/sky/admin_policy.py>`_:


.. literalinclude:: ../../../sky/admin_policy.py
:language: python
:pyobject: AdminPolicy
:caption: `AdminPolicy Interface <https://github.com/skypilot-org/skypilot/blob/master/sky/admin_policy.py>`_


Your custom admin policy should look like this:

.. code-block:: python
import sky
class MyPolicy(sky.AdminPolicy):
@classmethod
def validate_and_mutate(cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
# Logic for validate and modify user requests.
...
return sky.MutatedUserRequest(user_request.task,
user_request.skypilot_config)
``UserRequest`` and ``MutatedUserRequest`` are defined as follows (see `source code <https://github.com/skypilot-org/skypilot/blob/master/sky/admin_policy.py>`_ for more details):


.. literalinclude:: ../../../sky/admin_policy.py
:language: python
:pyobject: UserRequest
:caption: `UserRequest Class <https://github.com/skypilot-org/skypilot/blob/master/sky/admin_policy.py>`_

.. literalinclude:: ../../../sky/admin_policy.py
:language: python
:pyobject: MutatedUserRequest
:caption: `MutatedUserRequest Class <https://github.com/skypilot-org/skypilot/blob/master/sky/admin_policy.py>`_


In other words, an ``AdminPolicy`` can mutate any fields of a user request, including
the :ref:`task <yaml-spec>` and the :ref:`global skypilot config <config-yaml>`,
giving admins a lot of flexibility to control user's SkyPilot usage.

An ``AdminPolicy`` can be used to both validate and mutate user requests. If
a request should be rejected, the policy should raise an exception.


The ``sky.Config`` and ``sky.RequestOptions`` classes are defined as follows:

.. literalinclude:: ../../../sky/skypilot_config.py
:language: python
:pyobject: Config
:caption: `Config Class <https://github.com/skypilot-org/skypilot/blob/master/sky/skypilot_config.py>`_


.. literalinclude:: ../../../sky/admin_policy.py
:language: python
:pyobject: RequestOptions
:caption: `RequestOptions Class <https://github.com/skypilot-org/skypilot/blob/master/sky/admin_policy.py>`_


Example Policies
----------------

We have provided a few example policies in `examples/admin_policy/example_policy <https://github.com/skypilot-org/skypilot/tree/master/examples/admin_policy/example_policy>`_. You can test these policies by installing the example policy package in your Python environment.

.. code-block:: bash
git clone https://github.com/skypilot-org/skypilot.git
cd skypilot
pip install examples/admin_policy/example_policy
Reject All
~~~~~~~~~~

.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
:language: python
:pyobject: RejectAllPolicy
:caption: `RejectAllPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/example_policy/example_policy/skypilot_policy.py>`_

.. literalinclude:: ../../../examples/admin_policy/reject_all.yaml
:language: yaml
:caption: `Config YAML for using RejectAllPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/reject_all.yaml>`_

.. _kubernetes-labels-policy:

Add Labels for all Tasks on Kubernetes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
:language: python
:pyobject: AddLabelsPolicy
:caption: `AddLabelsPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/example_policy/example_policy/skypilot_policy.py>`_

.. literalinclude:: ../../../examples/admin_policy/add_labels.yaml
:language: yaml
:caption: `Config YAML for using AddLabelsPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/add_labels.yaml>`_


.. _disable-public-ip-policy:

Always Disable Public IP for AWS Tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
:language: python
:pyobject: DisablePublicIpPolicy
:caption: `DisablePublicIpPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/example_policy/example_policy/skypilot_policy.py>`_

.. literalinclude:: ../../../examples/admin_policy/disable_public_ip.yaml
:language: yaml
:caption: `Config YAML for using DisablePublicIpPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/disable_public_ip.yaml>`_

.. _use-spot-for-gpu-policy:

Use Spot for all GPU Tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~

..
.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
:language: python
:pyobject: UseSpotForGpuPolicy
:caption: `UseSpotForGpuPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/example_policy/example_policy/skypilot_policy.py>`_

.. literalinclude:: ../../../examples/admin_policy/use_spot_for_gpu.yaml
:language: yaml
:caption: `Config YAML for using UseSpotForGpuPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/use_spot_for_gpu.yaml>`_

.. _enforce-autostop-policy:

Enforce Autostop for all Tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
:language: python
:pyobject: EnforceAutostopPolicy
:caption: `EnforceAutostopPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/example_policy/example_policy/skypilot_policy.py>`_

.. literalinclude:: ../../../examples/admin_policy/enforce_autostop.yaml
:language: yaml
:caption: `Config YAML for using EnforceAutostopPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/enforce_autostop.yaml>`_
3 changes: 2 additions & 1 deletion docs/source/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,8 @@ Read the research:
../cloud-setup/cloud-permissions/index
../cloud-setup/cloud-auth
../cloud-setup/quota

../cloud-setup/policy

.. toctree::
:hidden:
:maxdepth: 1
Expand Down
11 changes: 11 additions & 0 deletions docs/source/reference/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,17 @@ Available fields and semantics:
# Default: false.
disable_ecc: false
# Admin policy to be applied to all tasks. (optional).
#
# The policy class to be applied to all tasks, which can be used to validate
# and mutate user requests.
#
# This is useful for enforcing certain policies on all tasks, e.g.,
# add custom labels; enforce certain resource limits; etc.
#
# The policy class should implement the sky.AdminPolicy interface.
admin_policy: my_package.SkyPilotPolicyV1
# Advanced AWS configurations (optional).
# Apply to all new instances but not existing ones.
aws:
Expand Down
1 change: 1 addition & 0 deletions examples/admin_policy/add_labels.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
admin_policy: example_policy.AddLabelsPolicy
1 change: 1 addition & 0 deletions examples/admin_policy/disable_public_ip.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
admin_policy: example_policy.DisablePublicIpPolicy
1 change: 1 addition & 0 deletions examples/admin_policy/enforce_autostop.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
admin_policy: example_policy.EnforceAutostopPolicy
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Example admin policy module and prebuilt policies."""
from example_policy.skypilot_policy import AddLabelsPolicy
from example_policy.skypilot_policy import DisablePublicIpPolicy
from example_policy.skypilot_policy import EnforceAutostopPolicy
from example_policy.skypilot_policy import RejectAllPolicy
from example_policy.skypilot_policy import UseSpotForGpuPolicy
121 changes: 121 additions & 0 deletions examples/admin_policy/example_policy/example_policy/skypilot_policy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
"""Example prebuilt admin policies."""
import sky


class RejectAllPolicy(sky.AdminPolicy):
"""Example policy: rejects all user requests."""

@classmethod
def validate_and_mutate(
cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
"""Rejects all user requests."""
raise RuntimeError('Reject all policy')


class AddLabelsPolicy(sky.AdminPolicy):
"""Example policy: adds a kubernetes label for skypilot_config."""

@classmethod
def validate_and_mutate(
cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
config = user_request.skypilot_config
labels = config.get_nested(('kubernetes', 'custom_metadata', 'labels'),
{})
labels['app'] = 'skypilot'
config.set_nested(('kubernetes', 'custom_metadata', 'labels'), labels)
return sky.MutatedUserRequest(user_request.task, config)


class DisablePublicIpPolicy(sky.AdminPolicy):
"""Example policy: disables public IP for all AWS tasks."""

@classmethod
def validate_and_mutate(
cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
config = user_request.skypilot_config
config.set_nested(('aws', 'use_internal_ip'), True)
if config.get_nested(('aws', 'vpc_name'), None) is None:
# If no VPC name is specified, it is likely a mistake. We should
# reject the request
raise RuntimeError('VPC name should be set. Check organization '
'wiki for more information.')
return sky.MutatedUserRequest(user_request.task, config)


class UseSpotForGpuPolicy(sky.AdminPolicy):
"""Example policy: use spot instances for all GPU tasks."""

@classmethod
def validate_and_mutate(
cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
"""Sets use_spot to True for all GPU tasks."""
task = user_request.task
new_resources = []
for r in task.resources:
if r.accelerators:
new_resources.append(r.copy(use_spot=True))
else:
new_resources.append(r)

task.set_resources(type(task.resources)(new_resources))

return sky.MutatedUserRequest(
task=task, skypilot_config=user_request.skypilot_config)


class EnforceAutostopPolicy(sky.AdminPolicy):
"""Example policy: enforce autostop for all tasks."""

@classmethod
def validate_and_mutate(
cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
"""Enforces autostop for all tasks.
Note that with this policy enforced, users can still change the autostop
setting for an existing cluster by using `sky autostop`.
Since we refresh the cluster status with `sky.status` whenever this
policy is applied, we should expect a few seconds latency when a user
run a request.
"""
request_options = user_request.request_options

# Request options is None when a task is executed with `jobs launch` or
# `sky serve up`.
if request_options is None:
return sky.MutatedUserRequest(
task=user_request.task,
skypilot_config=user_request.skypilot_config)

# Get the cluster record to operate on.
cluster_name = request_options.cluster_name
cluster_records = []
if cluster_name is not None:
cluster_records = sky.status(cluster_name, refresh=True)

# Check if the user request should specify autostop settings.
need_autostop = False
if not cluster_records:
# Cluster does not exist
need_autostop = True
elif cluster_records[0]['status'] == sky.ClusterStatus.STOPPED:
# Cluster is stopped
need_autostop = True
elif cluster_records[0]['autostop'] < 0:
# Cluster is running but autostop is not set
need_autostop = True

# Check if the user request is setting autostop settings.
is_setting_autostop = False
idle_minutes_to_autostop = request_options.idle_minutes_to_autostop
is_setting_autostop = (idle_minutes_to_autostop is not None and
idle_minutes_to_autostop >= 0)

# If the cluster requires autostop but the user request is not setting
# autostop settings, raise an error.
if need_autostop and not is_setting_autostop:
raise RuntimeError('Autostop/down must be set for all clusters.')

return sky.MutatedUserRequest(
task=user_request.task,
skypilot_config=user_request.skypilot_config)
7 changes: 7 additions & 0 deletions examples/admin_policy/example_policy/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "example_policy"
version = "0.0.1"
1 change: 1 addition & 0 deletions examples/admin_policy/reject_all.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
admin_policy: example_policy.RejectAllPolicy
12 changes: 12 additions & 0 deletions examples/admin_policy/task.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
resources:
cloud: aws
cpus: 2
labels:
other_labels: test


setup: |
echo "setup"
run: |
echo "run"
1 change: 1 addition & 0 deletions examples/admin_policy/use_spot_for_gpu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
admin_policy: example_policy.UseSpotForGpuPolicy
Loading

0 comments on commit 800f7d6

Please sign in to comment.