Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design documentation for the NonAdminBackup Status and Conditions #23

Merged
merged 5 commits into from
May 17, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions design/nab_status_update_ver_1.md
mateusoliveira43 marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need this for #35

Do you believe this design can cover both Backup and Restores, or best to have one for each?

FYI @shubham-pampattiwar @weshayutin

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have similar phases for Restores too, so should cover both. Maybe just add a note if there is anything that will be different for restores.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to have same Phases for Restore:
New
BackingOff
Created

What will be different is conditions, they are coupled with backup e.g.:

BackupAccepted
BackupQueued

At this stage we can make those also generic and convert to Accepted and Queued which would mean Restore also could use same names?

@shubham-pampattiwar ^^ ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I was gonna suggest the same, lets drop the Backup prefix and we should be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I read, the text seems right and should be easy to follow it to have same logic in restore

question is if we need a design for that later, if yes, I suggest making this one for both restore and backup (having only backup examples, saying it is the same for restores)

Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Developer Workflow: NonAdminBackup Phase and Conditions within NonAdminBackup Status

## Version

This is `ver_1` of the design document. Simplifier approach is described in the [ver_2].

## Overview

This document outlines the design around updating NonAdminBackup objects Phase and Conditions within Status.

### Phase

A NonAdminBackup's `status` field has a `phase` field, which is updated by NAC controller.

The `phase` is a simple one high-level summary of the lifecycle of an NonAdminBackup.

It is always a one well defined value, that is intended to be a comprehensive state of a NonAdminBackup.

Those are are the possible values for phase:

| **Value** | **Description** |
|-----------|--------------------------------|
| New | *NonAdminBackup resource was accepted by the OpenShift cluster, but it has not yet been processed by the NonAdminController* |
| BackingOff | Velero *Backup* object was not created due to NonAdminBackup error (configuration or similar) |
| Created | Velero *Backup* was created, but it has not yet fully ran or started |
| Running | Velero *Backup* is currently running |
| Succeeded | Velero *Backup* was performed with Success state |
| Failed | Velero *Backup* did not succeed |

### Conditions

The `conditions` is also a part of the NonAdminBackup's `status` field. One NAB object may have multiple conditions. It is more granular knowledge of the NonAdminBackup object and represents the array of the conditions through which the NonAdminBackup has or has not passed. Each `NonAdminCondition` has one of the following `type`:

| **Condition** | **Description** |
|-----------|--------------------------------|
| BackupAccepted | The Backup object was accepted by the reconcile loop, but the Velero Backup may have not yet been created |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I would accept only after velero accepts it. My fear it that even a backup CR looks good (pass cluster validation) it may not work (because of lack of cluster validation or BSL is invalid, for example)

| BackupQueued | The Velero Backup was created succesfully and did not return any known errors. It's in the queue for Backup. At this stage errors may still occur during backup procedure. |
| BackupCompleted | The Velero Backup succeeded. |

The `condition` data is also accomapied with the following:

| **Field name** | **Description** |
|-----------|--------------------------------|
| type | The `Type` of the condition |
| status | represents the state of individual condition. The resulting `phase` value should report `Success` only when all of the conditions are met and the backup succeded. One of `True`, `False` or `Unknown`. |
| lastProbeTime | Timestamp of when the NonAdminBackup condition was last probed. |
| lastTransitionTime | Timestamp for when the NonAdminBackup last transitioned from one status to another. |
| reason | Machine-readable, UpperCamelCase text indicating the reason for the condition's last transition. |
| message | Human-readable message indicating details about the last status transition. |

### BackupStatus

`BackupStatus` which is also part of the `NonAdminBackupStatus` object is a `BackupStatus` that is taken directly from the Velero Backup Status and copied over.

## Phase Update scenarios

*** Questions ***

- BackupQueued stays true at the end,
should we remove that from conditions?
I don't think we should set it to false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is good to set false, may be confusing if operation finished and this is still true. But have a way to say "it is not queued because it completed"


```mermaid
graph
mpryc marked this conversation as resolved.
Show resolved Hide resolved
AA[Phase: New] -->A
A{BackupAccepted: Unknown\n BackupQueued: False\n BackupCompleted: False} -- NAC processes --> B[Non Admin Backup Accepted]
A -- NAC hits an error --> E[Phase: BackingOff]
B -- Create Velero Backup --> C[Phase: Created]
B -.-> BA{BackupAccepted: True\nBackupQueued: False\nBackupCompleted: False}
C -- Backup runs --> D[Phase: Running]
C -.-> BS{BackupAccepted: True\nBackupQueued: True\nBackupCompleted: False}
D -- Backup succeeds --> F[Phase: Succeeded]
F -.-> FF{BackupAccepted: True\nBackupQueued: True\nBackupCompleted: True}

D -- Backup fails --> G[Phase: Failed]
E -.-> EE{BackupAccepted: False\nBackupQueued: False\nBackupCompleted: False}

classDef conditions fill:#ccc,stroke:#ccc,stroke-width:2px;
mpryc marked this conversation as resolved.
Show resolved Hide resolved
class A,BA,BS,BC,EE,FF conditions;
classDef phases fill:#777,stroke:#ccc,stroke-width:2px;
class AA,C,D,E,G,F phases;

```


[ver_2]: ./nab_status_update_ver_2.md
90 changes: 90 additions & 0 deletions design/nab_status_update_ver_2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Developer Workflow: NonAdminBackup Phase and Conditions within NonAdminBackup Status

## Version

This is `ver_2` of the design document. More complex approach with more states is described in the [ver_1].

## Overview

This document outlines the design around updating NonAdminBackup objects Phase and Conditions within Status.

### Phase

A NonAdminBackup's `status` field has a `phase` field, which is updated by NAC controller.

The `phase` is a simple one high-level summary of the lifecycle of an NonAdminBackup.

It is always a one well defined value, that is intended to be a comprehensive state of a NonAdminBackup.

Those are are the possible values for phase:

| **Value** | **Description** |
|-----------|--------------------------------|
| New | *NonAdminBackup resource was accepted by the OpenShift cluster, but it has not yet been processed by the NonAdminController* |
| BackingOff | Velero *Backup* object was not created due to NonAdminBackup error (configuration or similar) |
| Created | Velero *Backup* was created. The Phase will not have additional informations about the |

### Conditions

The `conditions` is also a part of the NonAdminBackup's `status` field. One NAB object may have multiple conditions. It is more granular knowledge of the NonAdminBackup object and represents the array of the conditions through which the NonAdminBackup has or has not passed. Each `NonAdminCondition` has one of the following `type`:

| **Condition** | **Description** |
|-----------|--------------------------------|
| BackupAccepted | The Backup object was accepted by the reconcile loop, but the Velero Backup may have not yet been created |
| BackupQueued | The Velero Backup was created succesfully and did not return any known errors. It's in the queue for Backup. At this stage errors may still occur during backup procedure. |

The `condition` data is also accomapied with the following:

| **Field name** | **Description** |
|-----------|--------------------------------|
| type | The `Type` of the condition |
| status | represents the state of individual condition. The resulting `phase` value should report `Success` only when all of the conditions are met and the backup succeded. One of `True`, `False` or `Unknown`. |
| lastProbeTime | Timestamp of when the NonAdminBackup condition was last probed. |
| lastTransitionTime | Timestamp for when the NonAdminBackup last transitioned from one status to another. |
| reason | Machine-readable, UpperCamelCase text indicating the reason for the condition's last transition. |
| message | Human-readable message indicating details about the last status transition. |

### BackupStatus

`BackupStatus` which is also part of the `NonAdminBackupStatus` object is a `BackupStatus` that is taken directly from the Velero Backup Status and copied over.

### OadpVeleroBackupName
The `OadpVeleroBackupName` field is a component of the `NonAdminBackupStatus` object. It represents the name of the `VeleroBackup` object, which encompasses the namespace. This `OadpVeleroBackupName` serves as a reference to the Backup responsible for executing the backup task.

The format of the `OadpVeleroBackupName` allows to interact with that Backup using `oc` or `velero` commands as follows:

```shell
# <backup-name>.<namespace>

$ oc describe <backup-name>.<namespace>
mpryc marked this conversation as resolved.
Show resolved Hide resolved

$ velero backup describe <backup-name>.<namespace>
```


## Phase Update scenarios

*** Questions ***

- BackupQueued stays true at the end,
should we remove that from conditions?
I don't think we should set it to false.

```mermaid
graph
AA[Phase: New] -->A
A{BackupAccepted: Unknown\n BackupQueued: False\n} -- NAC processes --> B[Non Admin Backup Accepted]
A -- NAC hits an error --> E[Phase: BackingOff]
B -- Create Velero Backup --> C[Phase: Created]
B -.-> BA{BackupAccepted: True\nBackupQueued: False\n}
C -.-> BS{BackupAccepted: True\nBackupQueued: True\n}
E -.-> EE{BackupAccepted: False\nBackupQueued: False\n}

classDef conditions fill:#ccc,stroke:#ccc,stroke-width:2px;
class A,BA,BS,EE conditions;
classDef phases fill:#777,stroke:#ccc,stroke-width:2px;
class AA,C,E phases;

```

[ver_1]: ./nab_status_update_ver_1.md
Loading