-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design documentation for the NonAdminBackup Status and Conditions #23
Changes from 3 commits
3464762
b219849
469b66e
5063e28
5be8939
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# Developer Workflow: NonAdminBackup Phase and Conditions within NonAdminBackup Status | ||
|
||
## Version | ||
|
||
This is `alternative` of the design document. Simplifier approach is described in the [active design] doc. | ||
|
||
## Overview | ||
|
||
This document outlines the design around updating NonAdminBackup objects Phase and Conditions within Status. | ||
|
||
### Phase | ||
|
||
A NonAdminBackup's `status` field has a `phase` field, which is updated by NAC controller. | ||
|
||
The `phase` is a simple one high-level summary of the lifecycle of an NonAdminBackup. | ||
|
||
It is always a one well defined value, that is intended to be a comprehensive state of a NonAdminBackup. | ||
|
||
Those are are the possible values for phase: | ||
|
||
| **Value** | **Description** | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lets add a table where we map the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I saw the diagram :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. on the gray squares, should not ALL of them have Queued populated ❓ |
||
|-----------|--------------------------------| | ||
| New | *NonAdminBackup resource was accepted by the OpenShift cluster, but it has not yet been processed by the NonAdminController* | | ||
| BackingOff | Velero *Backup* object was not created due to NonAdminBackup error (configuration or similar) | | ||
| Created | Velero *Backup* was created, but it has not yet fully ran or started | | ||
| Running | Velero *Backup* is currently running | | ||
| Succeeded | Velero *Backup* was performed with Success state | | ||
| Failed | Velero *Backup* did not succeed | | ||
|
||
### Conditions | ||
|
||
The `conditions` is also a part of the NonAdminBackup's `status` field. One NAB object may have multiple conditions. It is more granular knowledge of the NonAdminBackup object and represents the array of the conditions through which the NonAdminBackup has or has not passed. Each `NonAdminCondition` has one of the following `type`: | ||
|
||
| **Condition** | **Description** | | ||
|-----------|--------------------------------| | ||
| BackupAccepted | The Backup object was accepted by the reconcile loop, but the Velero Backup may have not yet been created | | ||
| BackupQueued | The Velero Backup was created succesfully and did not return any known errors. It's in the queue for Backup. At this stage errors may still occur during backup procedure. | | ||
| BackupCompleted | The Velero Backup succeeded. | | ||
|
||
The `condition` data is also accomapied with the following: | ||
|
||
| **Field name** | **Description** | | ||
|-----------|--------------------------------| | ||
| type | The `Type` of the condition | | ||
| status | represents the state of individual condition. The resulting `phase` value should report `Success` only when all of the conditions are met and the backup succeded. One of `True`, `False` or `Unknown`. | | ||
| lastProbeTime | Timestamp of when the NonAdminBackup condition was last probed. | | ||
| lastTransitionTime | Timestamp for when the NonAdminBackup last transitioned from one status to another. | | ||
| reason | Machine-readable, UpperCamelCase text indicating the reason for the condition's last transition. | | ||
| message | Human-readable message indicating details about the last status transition. | | ||
|
||
### BackupStatus | ||
|
||
`BackupStatus` which is also part of the `NonAdminBackupStatus` object is a `BackupStatus` that is taken directly from the Velero Backup Status and copied over. | ||
|
||
## Phase Update scenarios | ||
|
||
*** Questions *** | ||
|
||
- BackupQueued stays true at the end, | ||
should we remove that from conditions? | ||
I don't think we should set it to false. | ||
|
||
```mermaid | ||
%%{init: {'theme':'forest'}}%% | ||
graph | ||
AA[Phase: New] -->A | ||
A{BackupAccepted: Unknown\n BackupQueued: False\n BackupCompleted: False} -- NAC processes --> B[Non Admin Backup Accepted] | ||
A -- NAC hits an error --> E[Phase: BackingOff] | ||
B -- Create Velero Backup --> C[Phase: Created] | ||
B -.-> BA{BackupAccepted: True\nBackupQueued: False\nBackupCompleted: False} | ||
C -- Backup runs --> D[Phase: Running] | ||
C -.-> BS{BackupAccepted: True\nBackupQueued: True\nBackupCompleted: False} | ||
D -- Backup succeeds --> F[Phase: Succeeded] | ||
F -.-> FF{BackupAccepted: True\nBackupQueued: True\nBackupCompleted: True} | ||
|
||
D -- Backup fails --> G[Phase: Failed] | ||
E -.-> EE{BackupAccepted: False\nBackupQueued: False\nBackupCompleted: False} | ||
|
||
classDef conditions fill:#ccc,stroke:#ccc,stroke-width:2px; | ||
class A,BA,BS,BC,EE,FF conditions; | ||
classDef phases fill:#777,stroke:#ccc,stroke-width:2px; | ||
class AA,C,D,E,G,F phases; | ||
|
||
``` | ||
|
||
|
||
[active design]: ./nab_status_update.md |
mateusoliveira43 marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# Developer Workflow: NonAdminBackup Phase and Conditions within NonAdminBackup Status | ||
|
||
## Overview | ||
|
||
This document outlines the design around updating NonAdminBackup objects Phase and Conditions within Status. | ||
|
||
### Phase | ||
|
||
A NonAdminBackup's `status` field has a `phase` field, which is updated by NAC controller. | ||
|
||
The `phase` is a simple one high-level summary of the lifecycle of an NonAdminBackup. | ||
|
||
It is always a one well defined value, that is intended to be a comprehensive state of a NonAdminBackup. | ||
|
||
Those are are the possible values for phase: | ||
|
||
| **Value** | **Description** | | ||
|-----------|--------------------------------| | ||
| New | *NonAdminBackup resource was accepted by the OpenShift cluster, but it has not yet been processed by the NonAdminController* | | ||
| BackingOff | Velero *Backup* object was not created due to NonAdminBackup error (configuration or similar) | | ||
| Created | Velero *Backup* was created. The Phase will not have additional informations about the | | ||
mateusoliveira43 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Conditions | ||
|
||
The `conditions` is also a part of the NonAdminBackup's `status` field. One NAB object may have multiple conditions. It is more granular knowledge of the NonAdminBackup object and represents the array of the conditions through which the NonAdminBackup has or has not passed. Each `NonAdminCondition` has one of the following `type`: | ||
|
||
| **Condition** | **Description** | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mpryc I think we also need a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When the The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Example: status:
conditions:
- lastTransitionTime: '2024-04-17T20:56:57Z'
message: NonAdminBackup CR does not contain valid BackupSpec
reason: invalid_backupspec
status: 'False'
type: BackupAccepted
phase: BackingOff There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When the user sees an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have User has the |
||
|-----------|--------------------------------| | ||
| BackupAccepted | The Backup object was accepted by the reconcile loop, but the Velero Backup may have not yet been created | | ||
| BackupQueued | The Velero Backup was created succesfully. At this stage errors may still occur either from the Velero not accepting backup or during backup procedure. | | ||
|
||
The `condition` data is also accomapied with the following: | ||
|
||
| **Field name** | **Description** | | ||
|-----------|--------------------------------| | ||
| type | The `Type` of the condition | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here is just Accepted or Queued, right ❓ I think is can be more clear There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently yes, but I don't want to limit that in the design, because we may say later we want to have other conditions. |
||
| status | represents the state of individual condition. The resulting `phase` value should report `Success` only when all of the conditions are met and the backup succeded. One of `True`, `False` or `Unknown`. | | ||
| lastProbeTime | Timestamp of when the NonAdminBackup condition was last probed. | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. unsure if we need this, seems like this ts would be managed by NAB controller and we will have to embed logic in it ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok I took it from the condition structure, but looks like I need to adjust to what's there. We need
|
||
| lastTransitionTime | Timestamp for when the NonAdminBackup last transitioned from one status to another. | | ||
| reason | Machine-readable, UpperCamelCase text indicating the reason for the condition's last transition. | | ||
| message | Human-readable message indicating details about the last status transition. | | ||
|
||
### BackupStatus | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you please add a sample of how the NAB status in yaml form would look like ? Also the overall golang struct to give a rough idea ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok here is entire (failing) one, anything else from it? apiVersion: nac.oadp.openshift.io/v1alpha1
kind: NonAdminBackup
metadata:
creationTimestamp: '2024-04-15T20:27:35Z'
generation: 2
managedFields:
- apiVersion: nac.oadp.openshift.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
'f:spec':
.: {}
'f:backupSpec': {}
manager: Mozilla
operation: Update
time: '2024-04-15T20:27:35Z'
- apiVersion: nac.oadp.openshift.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
'f:spec':
'f:backupSpec':
'f:itemOperationTimeout': {}
'f:metadata': {}
'f:volumeSnapshotLocations': {}
'f:snapshotMoveData': {}
'f:defaultVolumesToFsBackup': {}
'f:ttl': {}
'f:csiSnapshotTimeout': {}
'f:storageLocation': {}
'f:hooks': {}
manager: main
operation: Update
time: '2024-04-15T20:27:45Z'
- apiVersion: nac.oadp.openshift.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
'f:status':
.: {}
'f:backupStatus':
.: {}
'f:expiration': {}
'f:failureReason': {}
'f:formatVersion': {}
'f:phase': {}
'f:startTimestamp': {}
'f:version': {}
'f:conditions': {}
'f:oadpVeleroBackup': {}
manager: main
operation: Update
subresource: status
time: '2024-04-16T08:12:12Z'
name: testnonadminbackup
namespace: nacproject
resourceVersion: '61393755'
uid: 12712cf9-2161-4e58-931d-21528afdff89
spec:
backupSpec:
volumeSnapshotLocations:
- velero-sample-1
defaultVolumesToFsBackup: false
csiSnapshotTimeout: 10m0s
ttl: 720h0m0s
itemOperationTimeout: 4h0m0s
metadata: {}
storageLocation: velero-sample-1
hooks: {}
snapshotMoveData: false
status:
backupStatus:
expiration: '2024-05-16T08:12:11Z'
failureReason: >-
unable to get credentials: unable to get key for secret: Secret
"cloud-credentials" not found
formatVersion: 1.1.0
phase: Failed
startTimestamp: '2024-04-16T08:12:11Z'
version: 1
conditions:
- lastTransitionTime: '2024-04-15T20:27:35Z'
message: backup accepted
reason: backup_accepted
status: 'True'
type: BackupAccepted
- lastTransitionTime: '2024-04-15T20:27:45Z'
message: Created Velero Backup object
reason: backup_scheduled
status: 'True'
type: BackupQueued
oadpVeleroBackup: nab-nacproject-83fc04a2fd253d There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This helps, so I would say we need to restructure a bit here, just remove the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not in favor of putting the backupStatus directly under NAB.status. It's because from implementation it's easier to compare structures that are same and NAB.status have already additional fields. Secondly it's clear what comes from Velero Status and what is addition to that status such as status:
expiration: '2024-05-16T08:12:11Z'
failureReason: >-
unable to get credentials: unable to get key for secret: Secret
"cloud-credentials" not found
formatVersion: 1.1.0
phase: Failed
startTimestamp: '2024-04-16T08:12:11Z'
version: 1
conditions:
- lastTransitionTime: '2024-04-15T20:27:35Z'
message: backup accepted
reason: backup_accepted
status: 'True'
type: BackupAccepted
- lastTransitionTime: '2024-04-15T20:27:45Z'
message: Created Velero Backup object
reason: backup_scheduled
status: 'True'
type: BackupQueued
veleroBackup: nab-nacproject-83fc04a2fd253d
veleroBackupNamespace: openshift-adp
|
||
`BackupStatus` which is also part of the `NonAdminBackupStatus` object is a `BackupStatus` that is taken directly from the Velero Backup Status and copied over. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No that is not true. The So whatever is under backupStatus is a deepcopy of status:
backupStatus:
expiration: '2024-05-16T08:12:11Z'
failureReason: >-
unable to get credentials: unable to get key for secret: Secret
"cloud-credentials" not found
formatVersion: 1.1.0
phase: Failed
startTimestamp: '2024-04-16T08:12:11Z'
version: 1 |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO we might need to add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am unsure here what you want to represent. The backupStatus:
expiration: '2024-05-16T08:12:11Z'
failureReason: >-
unable to get credentials: unable to get key for secret: Secret
"cloud-credentials" not found
formatVersion: 1.1.0
phase: Failed
startTimestamp: '2024-04-16T08:12:11Z'
version: 1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok we can hold off on adding There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding |
||
### VeleroBackupName and VeleroBackupNamespace | ||
The `VeleroBackupName` is a component of the `NonAdminBackupStatus` object. It represents the name of the `VeleroBackup` object. The `VeleroBackupNamespace` represents the namespace in which the `VeleroBackup` object was created. | ||
|
||
This `VeleroBackupName` and `VeleroBackupNamespace` serves as a reference to the Backup responsible for executing the backup task. | ||
|
||
The format of those fields allows to interact with that Backup using `oc` or `velero` commands as follows: | ||
|
||
```shell | ||
# Example: | ||
# veleroBackupName: nab-nacproject-c3499c2729730a | ||
# veleroBackupNamespace: openshift-adp | ||
|
||
$ oc describe -n openshift-adp nab-nacproject-c3499c2729730a | ||
|
||
$ velero backup describe -n openshift-adp nab-nacproject-c3499c2729730a | ||
``` | ||
|
||
|
||
## Phase Update scenarios | ||
|
||
shubham-pampattiwar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
```mermaid | ||
%%{init: {'theme':'forest'}}%% | ||
graph | ||
START[Phase: New] -- NAC config OK --> ACCEPTED[Non Admin Backup Accepted] | ||
START -- NAC config NOT OK --> ERROR[Phase: BackingOff] | ||
ACCEPTED -- Create Velero Backup --> CREATED[Phase: Created] | ||
ACCEPTED -.-> COND_ACCEPTED{BackupAccepted: True\n} | ||
CREATED -.-> COND_QUEUED{BackupAccepted: True\nBackupQueued: True\n} | ||
ERROR -.-> COND_ERROR{BackupAccepted: False} | ||
|
||
classDef conditions fill:#ccc,stroke:#ccc,stroke-width:2px; | ||
class COND_ACCEPTED,COND_QUEUED,COND_ERROR conditions; | ||
classDef phases fill:#777,stroke:#ccc,stroke-width:2px; | ||
class START,CREATED,ERROR phases; | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have this in other repos ❓
I think this should be here https://github.com/migtools/oadp-non-admin/pull/46/files#diff-bd080a4db24c63760afb715aa596308ed21ab443ea0ec19575d4c5d3d1c249b4R40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this PR imo.