Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non-admin controller design #18

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

shubham-pampattiwar
Copy link
Member

@shubham-pampattiwar shubham-pampattiwar commented Mar 13, 2024

Copy link
Collaborator

@mpryc mpryc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham-pampattiwar I also was writing similar doc, pushed to my branch (without PR) but it's different as I am using mermaid to create diagrams.

Let's discuss on our sync which way to go. I like the clarity of your diagram a bit better, mine is more detailed, but maybe too much?

https://github.com/mpryc/oadp-non-admin/blob/docs_design/docs/design/NonAdminBackup.md

### Backup Operation
- As a non-admin user/namespace owner with administrative priviledges for a particular namespace, the user should be able to:
- Create a Backup/Schedule of the namespace
- Update the Backup/Schedule spec of the namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one I am confused of what will happen. Example: if I update after Velero started (or finished) the backup process, nothing will happen, right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure what happens to Velero if you update a backup after it starts. Not sure we need to make any promises here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for backups -- for schedules, those can be updated, but see above as to whether we need schedules for the first iteration)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you update backup spec, nac could create deleteBackupRequests, mark NAB as pending, then once old backup is gone, NAB will be processing with new backup of the same name.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, updating backup spec makes sense as it just does exactly what a normal velero (admin) user modifying backup spec would do. If there's any modification that makes sense in velero, then it could be done here.


## Open Questions and Know Limitations
- Velero command and pod logs
- Multiple instance of OADP Operator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add:

  • status regarding the queue of backups pending or running that may be blocking the non-admin backup.
  • e.g. There are currently 5 OADP backups queued.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple instances of OADP operator can exist, but we must make sure that no more than one of them enable non-admin. If a second DPA adds enableNonAdmin, that should trigger a validation error.

docs/design/Non_Admin_Controller_design.md Outdated Show resolved Hide resolved
docs/design/Non_Admin_Controller_design.md Outdated Show resolved Hide resolved
### Backup Operation
- As a non-admin user/namespace owner with administrative priviledges for a particular namespace, the user should be able to:
- Create a Backup/Schedule of the namespace
- Update the Backup/Schedule spec of the namespace
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure what happens to Velero if you update a backup after it starts. Not sure we need to make any promises here.

### Backup Operation
- As a non-admin user/namespace owner with administrative priviledges for a particular namespace, the user should be able to:
- Create a Backup/Schedule of the namespace
- Update the Backup/Schedule spec of the namespace
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for backups -- for schedules, those can be updated, but see above as to whether we need schedules for the first iteration)


- The Non-Admin Controller (NAC) will be installed via OADP Operator.
- The Data Protection Application (DPA) CR will consist of a root level spec flag called `enableNonAdmin`
- If the `enableNonAdmin` flag is set to `true`, the OADP Operator will install the NAC in OADP Operator's install namespace, the default value for `enableNonAdmin` will be `false`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some validation to ensure that only one DPA in the cluster sets this to true. Otherwise, if there are 2 OADP instances, both enabling non-admin, we will have race conditions around which velero instance handles each NonAdminBackup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are 2 OADP instances, both enabling non-admin

Both instances in a supported scenario would be in separate oadp namespaces, I don't see a possible overlap here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default it watches all Namespaces

nvm. agree with sseago.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. scenario is this:

  • OADP installed in openshift-adp
  • OADP installed in openshift-adp2
  • user creates NonAdminBackup in mysql-persistent

If both OADPs have non-admin enabled, it's a race condition as to which one grabs and labels/annotates it first (and creates Velero CR), and possibly in some cases you end up with both attempting to modify it at the same time. If only one of them has non-admin enabled, then that OADP will manage it, and the other OADP will stay out of the way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think non admin use case will prop up only in env where multiple users share a cluster. Therefore I think we need to ensure that multiple non admin controllers per velero is supported, so we need a way to limit the scope of each nac so it doesn't overlap.

docs/design/Non_Admin_Controller_design.md Outdated Show resolved Hide resolved
- Non-Admin Backup (NAB) CRD: This iCRD will encapsulate the whole Velero Backup CRD and some additional spec felds that will be needed for non-admin feature.
- Non-Admin Restore (NAR) CRD

**Note:** Currently, this design considers that the responsibility of the BackupStorageLocation configuration is that of the cluster admin and not non-admin/namespace admin. Hence, no introduction of non-admin BSL controllers and CRDs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this means users will not be managing their own BSLs/buckets. They will have to know the name of the configured BSL (given by admin), and the NAC will assume that any user who knows the name of a BSL is permitted to back up to that BSL. Is this correct?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, any NAC user who does not specify/know the BSL name will be considered approved to use the default BSL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will have a way for admin to scope who can use which bsl, the nab/nar controller will validate accordingly before processing the CRs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will that require a CR, or will it be managed via annotations/labels/etc on the BSL itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up for discussion.

docs/design/Non_Admin_Controller_design.md Outdated Show resolved Hide resolved
docs/design/Non_Admin_Controller_design.md Outdated Show resolved Hide resolved

## Open Questions and Know Limitations
- Velero command and pod logs
- Multiple instance of OADP Operator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple instances of OADP operator can exist, but we must make sure that no more than one of them enable non-admin. If a second DPA adds enableNonAdmin, that should trigger a validation error.

Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If admin creates dpa on another cluster, and use the bsl containing NAC backups, will NAC backups.velero.io synced back to cluster regenerate backup.nac.oadp.openshift.io?

@sseago
Copy link

sseago commented Mar 21, 2024

If admin creates dpa on another cluster, and use the bsl containing NAC backups, will NAC backups.velero.io synced back to cluster regenerate backup.nac.oadp.openshift.io?

This is a good question. Might be required for disaster recovery of non-admin backups (i.e. cluster is corrupted, rebuilt from scratch, add BSL to new OADP instance, user backups have returned. This would require a non-admin backup sync controller to watch in-cluster velero Backups, and if one is found with NAC labels/annotations without a corresponding NAC backup, it would regenerate (creating the namespace if necessary).

- Non-Admin Backup (NAB) Controller: The responsibilities of the NAB controller are:
- Validate whether the non-admin user has appropriate administrative namepsace access
- Validate Wehther the non-admin user has appropriate access to create/view/update /delete Non-Admin Backup CR
- Listen to requests pertaining to Non-Admin Backup CRD across all the namespaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham-pampattiwar please update this such that more than one OADP/NAC combo can be installed on the same cluster. Requests will have to be namespace filtered. @mpryc @mateusoliveira43 I suppose the namespaces for the NAC will be configured in the DPA?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not discussed, I believe. But one option, yes

…ntroller, Queuing mechanism, Validating Webhooks, NAC watch NS filtering and control over spec exposed to non-admin users
Copy link

openshift-ci bot commented Apr 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shubham-pampattiwar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [shubham-pampattiwar]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Apr 9, 2024
@shubham-pampattiwar
Copy link
Member Author

This latest commit addresses the following:

  • Restore workflow
  • Non-Admin Backup Sync controller
  • NABSL controller and CRD
  • Queuing Status mechanism
  • Watch NS filtering via list or labelselector
  • Update Pre-conditions
  • Make sure that multiple instances do not overlap for watch NS list
  • Control the spec exposed (on-going discussion)
  • Validating Webhooks
  • removes schedules and backup spec updates op
  • minor fixes and nit fixes

@shubham-pampattiwar shubham-pampattiwar marked this pull request as ready for review August 28, 2024 05:31
@shubham-pampattiwar
Copy link
Member Author

Need to add a note regarding tech preview: #47 (comment)

@shubham-pampattiwar shubham-pampattiwar self-assigned this Aug 28, 2024
# Non-Admin Backup/Restore Design

## Background
[OADP (Openshift API for Data Protection)](https://github.com/openshift/oadp-operator) Operator currently requires cluster admin access for performing Backup and Restore operations of applications deployed on the OpenShift platform.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here "has" would be better than "requires"?

suppose someone tries to use OADP in non admin way (without NAC). Installs it in a non admin namespace. the non admin user can only create things in that namespace, but it can tell OADP to backup/restore anything (in any namespace) in the cluster, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either is fine, currently we advertise OADP functionalities as cluster admin tasks so that way requires makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Ready for Review
Development

Successfully merging this pull request may close these issues.

Add OADP non-admin controller design Design: OADP non-admin use cases
7 participants