Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Admin policy enforcement plugin #3966

Merged
merged 67 commits into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
cb28b8d
support policy hook
Michaelvll Sep 19, 2024
b64efa0
test task labels
Michaelvll Sep 19, 2024
cf89929
Add test for policy that sets labels
Michaelvll Sep 20, 2024
54c93ea
Fix comment
Michaelvll Sep 20, 2024
1d1c500
format
Michaelvll Sep 20, 2024
a0bdb2c
use -e to make test related files visible
Michaelvll Sep 20, 2024
543e66a
Add config.rst
Michaelvll Sep 20, 2024
520a2a1
Fix test
Michaelvll Sep 20, 2024
b533351
fix config rst
Michaelvll Sep 20, 2024
466f7fe
Apply policy to service
Michaelvll Sep 20, 2024
050dc7a
add policy for serving
Michaelvll Sep 20, 2024
31e0174
Add docs
Michaelvll Sep 20, 2024
0c74f2a
fix
Michaelvll Sep 20, 2024
48a6cc9
format
Michaelvll Sep 20, 2024
1ca5a8a
Update interface
Michaelvll Sep 20, 2024
14b2346
fix
Michaelvll Sep 21, 2024
cb39c73
Fix
Michaelvll Sep 21, 2024
1e3ddef
fix
Michaelvll Sep 21, 2024
aa87df7
Fix test config
Michaelvll Sep 21, 2024
28487a4
Fix mutated config
Michaelvll Sep 21, 2024
d1f0480
fix
Michaelvll Sep 21, 2024
f42ace5
Add policy doc
Michaelvll Sep 21, 2024
c04f3dc
rename
Michaelvll Sep 21, 2024
58f413c
minor
Michaelvll Sep 21, 2024
52053bd
Add additional arguments for autostop
Michaelvll Sep 21, 2024
4a4f682
fix mypy
Michaelvll Sep 21, 2024
a8d1c44
format
Michaelvll Sep 22, 2024
6c73d81
rejected message
Michaelvll Sep 22, 2024
247c0b8
format
Michaelvll Sep 22, 2024
f8a5a64
Update sky/utils/policy_utils.py
Michaelvll Sep 22, 2024
73a4581
Update sky/utils/policy_utils.py
Michaelvll Sep 22, 2024
d78a822
Fix
Michaelvll Sep 22, 2024
8cc963c
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 22, 2024
68275f6
Update examples/admin_policy/example_policy/example_policy/__init__.py
Michaelvll Sep 22, 2024
9644622
Update docs/source/reference/config.rst
Michaelvll Sep 22, 2024
17f8fa1
Address comments
Michaelvll Sep 22, 2024
07c4748
format
Michaelvll Sep 22, 2024
15f1062
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 22, 2024
994272b
changes in examples
Michaelvll Sep 22, 2024
3597dae
Fix enforce autostop
Michaelvll Sep 22, 2024
43a6088
Fix autostop enforcement
Michaelvll Sep 22, 2024
8770d0b
fix test
Michaelvll Sep 22, 2024
7984beb
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
d155d60
Update sky/admin_policy.py
Michaelvll Sep 23, 2024
6ffa5ae
Update sky/admin_policy.py
Michaelvll Sep 23, 2024
a6dd900
wip
Michaelvll Sep 23, 2024
4274287
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
0609482
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
67552d7
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
7de757e
fix
Michaelvll Sep 23, 2024
8443ddc
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 23, 2024
7fbc30d
fix
Michaelvll Sep 23, 2024
92b68fc
fix
Michaelvll Sep 23, 2024
7d8af9a
Use sky.status for autostop
Michaelvll Sep 23, 2024
5b37f47
update policy
Michaelvll Sep 23, 2024
c7af310
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
cb232a8
fix policy.rst
Michaelvll Sep 23, 2024
5e9f544
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 23, 2024
deb4c92
Add comment
Michaelvll Sep 23, 2024
cbff59d
Fix logging
Michaelvll Sep 23, 2024
1fe350a
fix CI
Michaelvll Sep 23, 2024
2e8e41c
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
aae42ce
Use sphnix inline code
Michaelvll Sep 23, 2024
73c8fb7
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 23, 2024
11bbd5e
Add comment
Michaelvll Sep 23, 2024
3630535
fix skypilot config file mounts for jobs and serve
Michaelvll Sep 23, 2024
e020dea
Merge branch 'master' of github.com:skypilot-org/skypilot into policy…
Michaelvll Sep 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install ".[all]"
pip install -e ".[all]"
pip install pytest pytest-xdist pytest-env>=0.6 memory-profiler==0.61.0

- name: Run tests with pytest
Expand Down
195 changes: 195 additions & 0 deletions docs/source/cloud-setup/policy.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
.. _advanced-policy-config:

Admin Policy Enforcement
concretevitamin marked this conversation as resolved.
Show resolved Hide resolved
========================


SkyPilot provides an **admin policy** mechanism that admins can use to enforce certain policies on users' SkyPilot usage. An admin policy applies
custom validation and mutation logic to a user's tasks and SkyPilot config.

Example usage:

- :ref:`kubernetes-labels-policy`
- :ref:`disable-public-ip-policy`
- :ref:`use-spot-for-gpu-policy`
- :ref:`enforce-autostop-policy`


To implement and use an admin policy:

- Admins writes a simple Python package with a policy class that implements SkyPilot's ``sky.AdminPolicy`` interface;
- Admins distributes this package to users;
- Users simply set the ``admin_policy`` field in the SkyPilot config file ``~/.sky/config.yaml`` for the policy to go into effect.


Overview
--------



User-Side
~~~~~~~~~~

To apply the policy, a user needs to set the ``admin_policy`` field in the SkyPilot config
``~/.sky/config.yaml`` to the path of the Python package that implements the policy.
For example:

.. code-block:: yaml

admin_policy: mypackage.subpackage.MyPolicy


.. hint::

SkyPilot loads the policy from the given package in the same Python environment.
You can test the existence of the policy by running:

.. code-block:: bash

python -c "from mypackage.subpackage import MyPolicy"


Admin-Side
~~~~~~~~~~

An admin can distribute the Python package to users with a pre-defined policy. The
policy should implement the ``sky.AdminPolicy`` `interface <https://github.com/skypilot-org/skypilot/blob/master/sky/admin_policy.py>`_:


.. literalinclude:: ../../../sky/admin_policy.py
:language: python
:pyobject: AdminPolicy
:caption: `AdminPolicy Interface <https://github.com/skypilot-org/skypilot/blob/master/sky/admin_policy.py>`_


Your custom admin policy should look like this:

.. code-block:: python

import sky

class MyPolicy(sky.AdminPolicy):
@classmethod
def validate_and_mutate(cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
# Logic for validate and modify user requests.
...
return sky.MutatedUserRequest(user_request.task,
user_request.skypilot_config)


``UserRequest`` and ``MutatedUserRequest`` are defined as follows (see `source code <https://github.com/skypilot-org/skypilot/blob/master/sky/admin_policy.py>`_ for more details):


.. literalinclude:: ../../../sky/admin_policy.py
:language: python
:pyobject: UserRequest
:caption: `UserRequest Class <https://github.com/skypilot-org/skypilot/blob/master/sky/admin_policy.py>`_

.. literalinclude:: ../../../sky/admin_policy.py
:language: python
:pyobject: MutatedUserRequest
:caption: `MutatedUserRequest Class <https://github.com/skypilot-org/skypilot/blob/master/sky/admin_policy.py>`_


In other words, an ``AdminPolicy`` can mutate any fields of a user request, including
the :ref:`task <yaml-spec>` and the :ref:`global skypilot config <config-yaml>`,
giving admins a lot of flexibility to control user's SkyPilot usage.

An ``AdminPolicy`` can be used to both validate and mutate user requests. If
a request should be rejected, the policy should raise an exception.


The ``sky.Config`` and ``sky.RequestOptions`` classes are defined as follows:

.. literalinclude:: ../../../sky/skypilot_config.py
:language: python
:pyobject: Config
:caption: `Config Class <https://github.com/skypilot-org/skypilot/blob/master/sky/skypilot_config.py>`_


.. literalinclude:: ../../../sky/admin_policy.py
:language: python
:pyobject: RequestOptions
:caption: `RequestOptions Class <https://github.com/skypilot-org/skypilot/blob/master/sky/admin_policy.py>`_


Example Policies
----------------

Michaelvll marked this conversation as resolved.
Show resolved Hide resolved
We have provided a few example policies in `examples/admin_policy/example_policy <https://github.com/skypilot-org/skypilot/tree/master/examples/admin_policy/example_policy>`_. You can test these policies by installing the example policy package in your Python environment.

.. code-block:: bash

git clone https://github.com/skypilot-org/skypilot.git
cd skypilot
pip install examples/admin_policy/example_policy

Reject All
~~~~~~~~~~

.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
:language: python
:pyobject: RejectAllPolicy
:caption: `RejectAllPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/example_policy/example_policy/skypilot_policy.py>`_

.. literalinclude:: ../../../examples/admin_policy/reject_all.yaml
:language: yaml
:caption: `Config YAML for using RejectAllPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/reject_all.yaml>`_

.. _kubernetes-labels-policy:

Add Labels for all Tasks on Kubernetes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
:language: python
:pyobject: AddLabelsPolicy
:caption: `AddLabelsPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/example_policy/example_policy/skypilot_policy.py>`_

.. literalinclude:: ../../../examples/admin_policy/add_labels.yaml
:language: yaml
:caption: `Config YAML for using AddLabelsPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/add_labels.yaml>`_


.. _disable-public-ip-policy:

Always Disable Public IP for AWS Tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
:language: python
:pyobject: DisablePublicIpPolicy
:caption: `DisablePublicIpPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/example_policy/example_policy/skypilot_policy.py>`_

.. literalinclude:: ../../../examples/admin_policy/disable_public_ip.yaml
:language: yaml
:caption: `Config YAML for using DisablePublicIpPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/disable_public_ip.yaml>`_

.. _use-spot-for-gpu-policy:

Use Spot for all GPU Tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~

..
.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
:language: python
:pyobject: UseSpotForGpuPolicy
:caption: `UseSpotForGpuPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/example_policy/example_policy/skypilot_policy.py>`_

.. literalinclude:: ../../../examples/admin_policy/use_spot_for_gpu.yaml
:language: yaml
:caption: `Config YAML for using UseSpotForGpuPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/use_spot_for_gpu.yaml>`_

.. _enforce-autostop-policy:

Enforce Autostop for all Tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
:language: python
:pyobject: EnforceAutostopPolicy
:caption: `EnforceAutostopPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/example_policy/example_policy/skypilot_policy.py>`_

.. literalinclude:: ../../../examples/admin_policy/enforce_autostop.yaml
:language: yaml
:caption: `Config YAML for using EnforceAutostopPolicy <https://github.com/skypilot-org/skypilot/blob/master/examples/admin_policy/enforce_autostop.yaml>`_
3 changes: 2 additions & 1 deletion docs/source/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,8 @@ Read the research:
../cloud-setup/cloud-permissions/index
../cloud-setup/cloud-auth
../cloud-setup/quota

../cloud-setup/policy

.. toctree::
:hidden:
:maxdepth: 1
Expand Down
11 changes: 11 additions & 0 deletions docs/source/reference/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,17 @@ Available fields and semantics:
# Default: false.
disable_ecc: false

# Admin policy to be applied to all tasks. (optional).
#
# The policy class to be applied to all tasks, which can be used to validate
# and mutate user requests.
Michaelvll marked this conversation as resolved.
Show resolved Hide resolved
#
# This is useful for enforcing certain policies on all tasks, e.g.,
# add custom labels; enforce certain resource limits; etc.
#
# The policy class should implement the sky.AdminPolicy interface.
admin_policy: my_package.SkyPilotPolicyV1

# Advanced AWS configurations (optional).
# Apply to all new instances but not existing ones.
aws:
Expand Down
1 change: 1 addition & 0 deletions examples/admin_policy/add_labels.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
admin_policy: example_policy.AddLabelsPolicy
1 change: 1 addition & 0 deletions examples/admin_policy/disable_public_ip.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
admin_policy: example_policy.DisablePublicIpPolicy
1 change: 1 addition & 0 deletions examples/admin_policy/enforce_autostop.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
admin_policy: example_policy.EnforceAutostopPolicy
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Example admin policy module and prebuilt policies."""
from example_policy.skypilot_policy import AddLabelsPolicy
from example_policy.skypilot_policy import DisablePublicIpPolicy
from example_policy.skypilot_policy import EnforceAutostopPolicy
from example_policy.skypilot_policy import RejectAllPolicy
from example_policy.skypilot_policy import UseSpotForGpuPolicy
121 changes: 121 additions & 0 deletions examples/admin_policy/example_policy/example_policy/skypilot_policy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
"""Example prebuilt admin policies."""
import sky


class RejectAllPolicy(sky.AdminPolicy):
"""Example policy: rejects all user requests."""

@classmethod
def validate_and_mutate(
cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
"""Rejects all user requests."""
raise RuntimeError('Reject all policy')


class AddLabelsPolicy(sky.AdminPolicy):
"""Example policy: adds a kubernetes label for skypilot_config."""

@classmethod
def validate_and_mutate(
cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
config = user_request.skypilot_config
labels = config.get_nested(('kubernetes', 'custom_metadata', 'labels'),
{})
labels['app'] = 'skypilot'
config.set_nested(('kubernetes', 'custom_metadata', 'labels'), labels)
return sky.MutatedUserRequest(user_request.task, config)


class DisablePublicIpPolicy(sky.AdminPolicy):
"""Example policy: disables public IP for all AWS tasks."""

@classmethod
def validate_and_mutate(
cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
config = user_request.skypilot_config
config.set_nested(('aws', 'use_internal_ip'), True)
if config.get_nested(('aws', 'vpc_name'), None) is None:
# If no VPC name is specified, it is likely a mistake. We should
# reject the request
raise RuntimeError('VPC name should be set. Check organization '
'wiki for more information.')
return sky.MutatedUserRequest(user_request.task, config)


class UseSpotForGpuPolicy(sky.AdminPolicy):
"""Example policy: use spot instances for all GPU tasks."""

@classmethod
def validate_and_mutate(
cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
"""Sets use_spot to True for all GPU tasks."""
task = user_request.task
new_resources = []
for r in task.resources:
if r.accelerators:
new_resources.append(r.copy(use_spot=True))
else:
new_resources.append(r)

task.set_resources(type(task.resources)(new_resources))

return sky.MutatedUserRequest(
task=task, skypilot_config=user_request.skypilot_config)


class EnforceAutostopPolicy(sky.AdminPolicy):
"""Example policy: enforce autostop for all tasks."""

@classmethod
def validate_and_mutate(
cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
"""Enforces autostop for all tasks.

Note that with this policy enforced, users can still change the autostop
setting for an existing cluster by using `sky autostop`.

Since we refresh the cluster status with `sky.status` whenever this
policy is applied, we should expect a few seconds latency when a user
run a request.
"""
request_options = user_request.request_options

# Request options is None when a task is executed with `jobs launch` or
# `sky serve up`.
if request_options is None:
return sky.MutatedUserRequest(
task=user_request.task,
skypilot_config=user_request.skypilot_config)

# Get the cluster record to operate on.
cluster_name = request_options.cluster_name
cluster_records = []
if cluster_name is not None:
cluster_records = sky.status(cluster_name, refresh=True)

# Check if the user request should specify autostop settings.
need_autostop = False
if not cluster_records:
# Cluster does not exist
need_autostop = True
elif cluster_records[0]['status'] == sky.ClusterStatus.STOPPED:
# Cluster is stopped
need_autostop = True
elif cluster_records[0]['autostop'] < 0:
# Cluster is running but autostop is not set
need_autostop = True

# Check if the user request is setting autostop settings.
is_setting_autostop = False
idle_minutes_to_autostop = request_options.idle_minutes_to_autostop
is_setting_autostop = (idle_minutes_to_autostop is not None and
idle_minutes_to_autostop >= 0)

# If the cluster requires autostop but the user request is not setting
# autostop settings, raise an error.
if need_autostop and not is_setting_autostop:
raise RuntimeError('Autostop/down must be set for all clusters.')

return sky.MutatedUserRequest(
task=user_request.task,
skypilot_config=user_request.skypilot_config)
7 changes: 7 additions & 0 deletions examples/admin_policy/example_policy/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "example_policy"
version = "0.0.1"
1 change: 1 addition & 0 deletions examples/admin_policy/reject_all.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
admin_policy: example_policy.RejectAllPolicy
12 changes: 12 additions & 0 deletions examples/admin_policy/task.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
resources:
cloud: aws
cpus: 2
labels:
other_labels: test


setup: |
echo "setup"

run: |
echo "run"
1 change: 1 addition & 0 deletions examples/admin_policy/use_spot_for_gpu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
admin_policy: example_policy.UseSpotForGpuPolicy
Loading
Loading