-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design documentation for the NonAdminBackup Status and Conditions #23
Design documentation for the NonAdminBackup Status and Conditions #23
Conversation
Two proposals for updating and keeping in sync NonAdminBackup Status and Condition fields. The Status also consists of Velero Backup Name within NAB object. Signed-off-by: Michal Pryc <mpryc@redhat.com>
4d1a76d
to
3464762
Compare
design/nab_status_update_ver_1.md
Outdated
|
||
| **Condition** | **Description** | | ||
|-----------|--------------------------------| | ||
| BackupAccepted | The Backup object was accepted by the reconcile loop, but the Velero Backup may have not yet been created | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here I would accept only after velero accepts it. My fear it that even a backup CR looks good (pass cluster validation) it may not work (because of lack of cluster validation or BSL is invalid, for example)
design/nab_status_update_ver_1.md
Outdated
|
||
- BackupQueued stays true at the end, | ||
should we remove that from conditions? | ||
I don't think we should set it to false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is good to set false, may be confusing if operation finished and this is still true. But have a way to say "it is not queued because it completed"
Just needs update on which approach we are going ahead with. |
design/nab_status_update_ver_1.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need this for #35
Do you believe this design can cover both Backup and Restores, or best to have one for each?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will have similar phases for Restores too, so should cover both. Maybe just add a note if there is anything that will be different for restores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to have same Phases for Restore:
New
BackingOff
Created
What will be different is conditions, they are coupled with backup e.g.:
BackupAccepted
BackupQueued
At this stage we can make those also generic and convert to Accepted
and Queued
which would mean Restore
also could use same names?
@shubham-pampattiwar ^^ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I was gonna suggest the same, lets drop the Backup
prefix and we should be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from what I read, the text seems right and should be easy to follow it to have same logic in restore
question is if we need a design for that later, if yes, I suggest making this one for both restore and backup (having only backup examples, saying it is the same for restores)
Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com> Signed-off-by: Michal Pryc <mpryc@redhat.com>
Adjusted design to address comments. The non-active design is in the folder archive, as we may come back to it at some point in the future. Signed-off-by: Michal Pryc <mpryc@redhat.com>
design/nab_status_update.md
Outdated
|
||
The `conditions` is also a part of the NonAdminBackup's `status` field. One NAB object may have multiple conditions. It is more granular knowledge of the NonAdminBackup object and represents the array of the conditions through which the NonAdminBackup has or has not passed. Each `NonAdminCondition` has one of the following `type`: | ||
|
||
| **Condition** | **Description** | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mpryc I think we also need a Error
condition here. So that whenever the backup is in BackingOff
phase the Error
condition is applied to the NAB object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the BackingOff
we set the BackupAccepted
condition to False
see the diagram in another comment. Do we need to have Error condition as well? I think it's a bit too many states.
The BackupAccepted
set to False have also the string reason/message where we can say why it was wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example:
status:
conditions:
- lastTransitionTime: '2024-04-17T20:56:57Z'
message: NonAdminBackup CR does not contain valid BackupSpec
reason: invalid_backupspec
status: 'False'
type: BackupAccepted
phase: BackingOff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the user sees an Error
condition they definitely think that something is not working as intended/expected.
The word Accepted
does not imply the same. I agree that the Accepted
condition can display the error message but having an Error
condition would improve the user experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have phase: BackingOff
for that, so this is high level information to tell that something went wrong.
User has the Accepted: False
status, of course we could add Accepted: False
and Error: True
or just Error: True
, but that in my opinion is duplication of what phase is for.
design/nab_status_update.md
Outdated
|-----------|--------------------------------| | ||
| type | The `Type` of the condition | | ||
| status | represents the state of individual condition. The resulting `phase` value should report `Success` only when all of the conditions are met and the backup succeded. One of `True`, `False` or `Unknown`. | | ||
| lastProbeTime | Timestamp of when the NonAdminBackup condition was last probed. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsure if we need this, seems like this ts would be managed by NAB controller and we will have to embed logic in it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I took it from the condition structure, but looks like I need to adjust to what's there. We need lastTransitionTime
instead (below is from the NAB object itself):
- lastTransitionTime: '2024-04-15T20:27:45Z'
message: Created Velero Backup object
reason: backup_scheduled
status: 'True'
type: BackupQueued
design/nab_status_update.md
Outdated
| message | Human-readable message indicating details about the last status transition. | | ||
|
||
### BackupStatus | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a sample of how the NAB status in yaml form would look like ? Also the overall golang struct to give a rough idea ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok here is entire (failing) one, anything else from it?
apiVersion: nac.oadp.openshift.io/v1alpha1
kind: NonAdminBackup
metadata:
creationTimestamp: '2024-04-15T20:27:35Z'
generation: 2
managedFields:
- apiVersion: nac.oadp.openshift.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
'f:spec':
.: {}
'f:backupSpec': {}
manager: Mozilla
operation: Update
time: '2024-04-15T20:27:35Z'
- apiVersion: nac.oadp.openshift.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
'f:spec':
'f:backupSpec':
'f:itemOperationTimeout': {}
'f:metadata': {}
'f:volumeSnapshotLocations': {}
'f:snapshotMoveData': {}
'f:defaultVolumesToFsBackup': {}
'f:ttl': {}
'f:csiSnapshotTimeout': {}
'f:storageLocation': {}
'f:hooks': {}
manager: main
operation: Update
time: '2024-04-15T20:27:45Z'
- apiVersion: nac.oadp.openshift.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
'f:status':
.: {}
'f:backupStatus':
.: {}
'f:expiration': {}
'f:failureReason': {}
'f:formatVersion': {}
'f:phase': {}
'f:startTimestamp': {}
'f:version': {}
'f:conditions': {}
'f:oadpVeleroBackup': {}
manager: main
operation: Update
subresource: status
time: '2024-04-16T08:12:12Z'
name: testnonadminbackup
namespace: nacproject
resourceVersion: '61393755'
uid: 12712cf9-2161-4e58-931d-21528afdff89
spec:
backupSpec:
volumeSnapshotLocations:
- velero-sample-1
defaultVolumesToFsBackup: false
csiSnapshotTimeout: 10m0s
ttl: 720h0m0s
itemOperationTimeout: 4h0m0s
metadata: {}
storageLocation: velero-sample-1
hooks: {}
snapshotMoveData: false
status:
backupStatus:
expiration: '2024-05-16T08:12:11Z'
failureReason: >-
unable to get credentials: unable to get key for secret: Secret
"cloud-credentials" not found
formatVersion: 1.1.0
phase: Failed
startTimestamp: '2024-04-16T08:12:11Z'
version: 1
conditions:
- lastTransitionTime: '2024-04-15T20:27:35Z'
message: backup accepted
reason: backup_accepted
status: 'True'
type: BackupAccepted
- lastTransitionTime: '2024-04-15T20:27:45Z'
message: Created Velero Backup object
reason: backup_scheduled
status: 'True'
type: BackupQueued
oadpVeleroBackup: nab-nacproject-83fc04a2fd253d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helps, so I would say we need to restructure a bit here, just remove the backupStatus
named struct, all the feilds of backupStatus
can be directly under NAB.status
. This would mean:
Current traversal is: NonAdminBackup.status.BackupStatus.phase
Suggested/Modified traversal will be : NonAdminBackup.status.phase
The backupStatus
under status seems redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not in favor of putting the backupStatus directly under NAB.status. It's because from implementation it's easier to compare structures that are same and NAB.status have already additional fields.
Secondly it's clear what comes from Velero Status and what is addition to that status such as veleroBackup
or veleroBackupNamespace
or even conditions
. After your proposal we would have:
status:
expiration: '2024-05-16T08:12:11Z'
failureReason: >-
unable to get credentials: unable to get key for secret: Secret
"cloud-credentials" not found
formatVersion: 1.1.0
phase: Failed
startTimestamp: '2024-04-16T08:12:11Z'
version: 1
conditions:
- lastTransitionTime: '2024-04-15T20:27:35Z'
message: backup accepted
reason: backup_accepted
status: 'True'
type: BackupAccepted
- lastTransitionTime: '2024-04-15T20:27:45Z'
message: Created Velero Backup object
reason: backup_scheduled
status: 'True'
type: BackupQueued
veleroBackup: nab-nacproject-83fc04a2fd253d
veleroBackupNamespace: openshift-adp
design/nab_status_update.md
Outdated
|
||
### BackupStatus | ||
|
||
`BackupStatus` which is also part of the `NonAdminBackupStatus` object is a `BackupStatus` that is taken directly from the Velero Backup Status and copied over. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NonAdminBackupStatus
will consist of:
- its own status
- and VeleroBackupStatus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that is not true. The BackupStatus
within NonAdminBackup.Status is just VeleroBackup.Status
nothing else.
So whatever is under backupStatus is a deepcopy of VeleroBackup.Status
and I would be really for not mixing Velero Status with other NAB status fields.
status:
backupStatus:
expiration: '2024-05-16T08:12:11Z'
failureReason: >-
unable to get credentials: unable to get key for secret: Secret
"cloud-credentials" not found
formatVersion: 1.1.0
phase: Failed
startTimestamp: '2024-04-16T08:12:11Z'
version: 1
design/nab_status_update.md
Outdated
### BackupStatus | ||
|
||
`BackupStatus` which is also part of the `NonAdminBackupStatus` object is a `BackupStatus` that is taken directly from the Velero Backup Status and copied over. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we might need to add Creation
and Completion
timestamp as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creation
of NAB ? That is already part of metadata, or we are talking here about Creation of VeleroBackup?
I am unsure here what you want to represent. The Completion
of backup operation? Isn't that part of the VeleroBackup status? See startTimestamp
and I expect VeleroBackup will also consist of completion part
backupStatus:
expiration: '2024-05-16T08:12:11Z'
failureReason: >-
unable to get credentials: unable to get key for secret: Secret
"cloud-credentials" not found
formatVersion: 1.1.0
phase: Failed
startTimestamp: '2024-04-16T08:12:11Z'
version: 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok we can hold off on adding completionTimestamp
Velero status timestamps should be good enough. We would need seperate completionTimestamp for NAB status if did any crucial processing once Velero backup is complete but in our case we will just be updated the NAB status so we should be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding completionTS
would just make the NAB status future proof.
|
||
Those are are the possible values for phase: | ||
|
||
| **Value** | **Description** | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add a table where we map the Phase
and Conditions
as well as VeleroBackupPhase
with each other for easier understanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw the diagram :)
ok lets at least tabulate velero phases
with NAB phases
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the gray squares, should not ALL of them have Queued populated ❓
e81a115
to
b228c84
Compare
Updated NAB workflow design that addresses the following: - Removed Backup prefix from the Condition name, so it can be the same for the Restore - Removed lastProbeTime from the Condition field - Added sample yaml files to easy reading how the object looks like - Updated diagram to present what is Condition Signed-off-by: Michal Pryc <mpryc@redhat.com>
b228c84
to
5063e28
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have this in other repos ❓
I think this should be here https://github.com/migtools/oadp-non-admin/pull/46/files#diff-bd080a4db24c63760afb715aa596308ed21ab443ea0ec19575d4c5d3d1c249b4R40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this PR imo.
design/nab_status_update.md
Outdated
|
||
| **Field name** | **Description** | | ||
|-----------|--------------------------------| | ||
| type | The `Type` of the condition | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is just Accepted or Queued, right ❓ I think is can be more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently yes, but I don't want to limit that in the design, because we may say later we want to have other conditions.
- Update reason example to be CammelCase - Move the design docs into docs/ folder - Fix one missing sentence Signed-off-by: Michal Pryc <mpryc@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Follow up PR for adding deleted phase and NonAdminRestore to design
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mateusoliveira43, mpryc, shubham-pampattiwar The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Two proposals for updating and keeping in sync NonAdminBackup Status and Condition fields.
The Status also consists of Velero Backup Name within NAB object.