-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to create OSD (obtaining uniq OSD.id) blocks further provisioning and possibly does not reach desired multiple OSDs per device #14238
Comments
I have seen logs with text that is similar to "OSD ID exists and does not match my key" when there is an old OSD present on a device that hasn't been fully wiped after an old Rook/Ceph deployment. It's likely that you need to run I suspect this may also resolve the unused space issues you're seeing. |
I'll check, but I'm not sure that is the same issue here. These are all new hosts that have never had OSDs on these disks; they did actually have LVMs that were cleared before doing discovery. We did remove a number of old OSDs off other nodes recently, as we are trying to migrate storage to new nodes.. |
I may have more issues going on; I found a number of OSDs that were in Crashloop state reporting:
What I discovered is that they key on the keyring file for these OSDs ( Not sure exactly what process is creating these keyring files or why they have the incorrect key in them. |
Finding we have 31 duplicate OSD IDs on active storage nodes, a sample:
|
fwiw, here is the error that occurs on creating new ODSs in prepare:
|
Can someone provide any insight as to where to view/repair the source of this conflict? rook has a number of rook-ceph-osd-NN deployments that do not seem to match to active OSDs that are in the cluster anymore, and this is preventing new OSDs from being prepared. I tried to for example use the purge job to remove osd.2 in conflict above, and it is not in the osd tree/dump...
|
I'm the assignee of this issue. I'll read this issue carefully tomorrow(I'm currently on PTO). |
What I think I discovered is two things:
However I still believe there is a race-condition conflict in selecting and creating auth for new OSDs I have multiple new-nodes/drives being discovered and the prepare jobs are crashing out with the same |
FYI, I monitored the provisioning process and anytime I saw a auth conflict, I deleted the auth entry and it was able to retry and proceed... Below is an example where 4 separate jobs all tried to assign osd.433. While provisioning some 300 OSDs, it had times where it conflicted 44 times and I had to go and wipe the disks and let it retry. Other times, as below I was able to stay on top of the auth conflicts and help it along.
Check for auth existing without corresponding OSD (I know this could be faster and also would be more robust using json, just quick script..):
|
@bdowling Many people have also encountered the similar or the same problem. To resolve this problem,
OSD ID allocation is done in As a workaround, disabling
It doesn't resolve this problem completely. However, I believe it reduces the prarallelism of OSD creation and then reduces the OSD ID confliction. |
@guits How exactly does ceph-volume allocate the OSD ID? This issue with a large number of OSDs being created in parallel makes it clear that it is not atomic, which causes quite a problem for large clusters. |
I'm going to mark this issue closed for now. In testing, I was unable to get duplicate osd IDs with I went looking for where that code actually does this work in ceph, but got lost in the indirection where to find e.g. simple testing...
|
Is this a bug report or feature request?
Deviation from expected behavior:
While discovery finds new devices and attempts to create OSDs on the devices, there appears to be a race condition.
"
exec: stderr: Error EEXIST: entity osd.2 exists but key does not match
" -- it felt like given the number of OSDs that were creating at the same time different nodes may have tried to get the same OSD id.When this happened, the prepare job failed and crashed out. When I came back to look at the results, I found that a number of nodes did not have the full 4 OSDs per device that I have configured, so now I am left with unusued space. Further discoveries do not appear to identify this unused space and try to create the remaining OSDs.
As shown below, this condition also seems to block any new provisioning, because an
ceph auth
entry is created, but the osd is not actually created in the tree/map, so future attempts to create the OSD seems to try the same (lowest available OSD.id).Expected behavior:
How to reproduce it (minimal and precise):
Add multiple nodes, with multiple devices each and see if the race condition occurs.
File(s) to submit:
cluster.yaml
, if necessaryOperator's logs, if necessary
Crashing pod(s) logs, if necessary
To get logs, use
kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the
insert code
button from the Github UI.Read GitHub documentation if you need help.
Cluster Status to submit:
Environment:
rook version
inside of a Rook Pod): rook/ceph:v1.13.4 (helm rook-ceph-v1.13.7)ceph -v
): 18.2.2. (helm rook-ceph-cluster-v1.13.7)kubectl version
): v1.28.4ceph health
in the Rook Ceph toolbox):The text was updated successfully, but these errors were encountered: