Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Check cloud-init status #955

Open
lentzi90 opened this issue Feb 24, 2022 · 14 comments
Open

Enhancement: Check cloud-init status #955

lentzi90 opened this issue Feb 24, 2022 · 14 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue is ready to be actively worked on.

Comments

@lentzi90
Copy link
Member

We recently had a problem in the CI where the OS we use for the Nodes had an update that made one of our preKubeadmCommands fail. This is in turn caused some network issues, which was what we detected first. It took quite some time before we managed to figure out the root cause because no one suspected that there was an error in the cloud-init commands. The Machines were provisioned, the cluster worked as expected in many ways, all Nodes healthy.

My suggestion is that we add a step to the integration tests (or even to the controller if possible) to detect errors in cloud-init and report them in a more obvious way. In the CI we should simply be able to check cloud-init status and error out if it is set to status: error.
To be clear, this check should be done on the workload cluster Nodes.

@metal3-io-bot metal3-io-bot added the needs-triage Indicates an issue lacks a `triage/foo` label and requires one. label Feb 24, 2022
@Rozzii
Copy link
Member

Rozzii commented Mar 2, 2022

/triage accepted
/kind feature
/help

@metal3-io-bot metal3-io-bot added triage/accepted Indicates an issue is ready to be actively worked on. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. and removed needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Mar 2, 2022
@Arvinderpal
Copy link
Contributor

CAPI uses a sentinel file to check if bootstrapping succeeded.
kubernetes-sigs/cluster-api#3716

Some providers, like capz, do check this file as an indication of successful bootstrapping.
For baremetal, the tricky part I think is accessing this file from the management cluster.

@Rozzii
Copy link
Member

Rozzii commented Mar 16, 2022

NOTE to everybody: Fell free to work on this and contact/ping @lentzi90 or ask it here if you have any question!

@metal3-io-bot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2022
@metal3-io-bot
Copy link
Collaborator

Stale issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle stale.

/close

@metal3-io-bot
Copy link
Collaborator

@metal3-io-bot: Closing this issue.

In response to this:

Stale issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle stale.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@furkatgofurov7
Copy link
Member

/reopen
/remove-lifecycle stale

@metal3-io-bot
Copy link
Collaborator

@furkatgofurov7: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle stale

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot metal3-io-bot reopened this Jul 14, 2022
@metal3-io-bot metal3-io-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 14, 2022
@furkatgofurov7
Copy link
Member

/remove-help

@metal3-io-bot metal3-io-bot removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Oct 5, 2022
@furkatgofurov7
Copy link
Member

/help

@metal3-io-bot
Copy link
Collaborator

@furkatgofurov7:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot metal3-io-bot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Oct 5, 2022
@metal3-io-bot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 3, 2023
@Rozzii
Copy link
Member

Rozzii commented Feb 1, 2023

/lifecycle frozen
Would be a good improvement in the future.

@metal3-io-bot metal3-io-bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 1, 2023
@Rozzii
Copy link
Member

Rozzii commented Mar 29, 2023

This would be a nice feature in the kubadm-bootstrap-operator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue is ready to be actively worked on.
Projects
Status: Backlog
Development

No branches or pull requests

5 participants