-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cold merge: prune stale bitmaps in base volume #354
Conversation
ec39541
to
41418ec
Compare
/ost |
1 similar comment
/ost |
This is not very clear, maybe explain that measuring only top does not consider the size
This cannot happen in the current system. I think the reason is leftover bitmaps that
Or when copying bitmaps from top to base after the commit finish. The expected cases are:
The case we see in the related bug is bitmap that existing in base but not in top. |
Understood. I will update the comments and try to explain it better.
ouch, yes, that could lead to this issue.
More so in the cold merge, as the live merge adds an entire extra chunk to avoid pausing if the guest writes data, which will be enough to "hide" the extra size of the untracked bitmaps. It is not ideal that we potentially over extend unnecessarily the volume (as in the common case we potentially double the space required for bitmaps), but is better than having many reports due to this buggy scenario occurring. |
Testing actual merge code is very complicated and requires too much mocking and faking and I wrote a simple reproducer using qemu-img commands that reproduces the issue, but it proves that the computation of require size using top and base bitmaps is not safe, so this is not the right fix.
This shows that we need more size than the size than top and base bitmaps and top required, Since the issue is stale bitmaps that we cannot use for anything, we can can fix the issue See how we find bitmaps in bitmaps._query_bitmaps(). We can add
that delete invalid bitmaps from base, and will be used before we measure base in cold Regardless of the vdsm fix this reproducer reveals issues in Reproducer
|
POC of a fix, using:
After the merge, base should have only the good bitmaps. If we add a merge test
|
41418ec
to
82b429e
Compare
Add merge test in block volumes, to have a realistic scenario with volumes and bitmaps. When bitmaps are added to the base volume after the top volume is create, measuring the top volume will not consider the base bitmaps size, even with backing enabled. This will result in not performing an extend and thus, failing the merge process with `Failed to write bitmap 'xxx' to file: No space left on device`. This test has been used to reproduce the error in https://bugzilla.redhat.com/2141371 Signed-off-by: Albert Esteve <aesteve@redhat.com>
865519c
to
b61c9d1
Compare
5cdb4c9
to
86e7d51
Compare
86e7d51
to
87d281e
Compare
/ost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, suggested minor tweaks.
Add prune_bitmaps function, to remove all stale bitmaps from a base volume (that are missing from the top volume). Signed-off-by: Albert Esteve <aesteve@redhat.com>
Prune the stale base volume bitmaps during the prepare step on a merge operation. These stale bitmaps can cause the merge operation to fail due to 'No space left on device'. In this case, qemu does not end with error, so the failure goes unnoticed. As there is not a reliable way to measure the size of these stale bitmaps, and they are invalid and can never be used for incremental backup, it is better to prune them to avoid the error. Related: oVirt#352 Signed-off-by: Albert Esteve <aesteve@redhat.com>
87d281e
to
806754e
Compare
/ost |
4 similar comments
/ost |
/ost |
/ost |
/ost |
/ost |
3 similar comments
/ost |
/ost |
/ost |
Prune the stale base volume bitmaps during
the prepare step on a merge operation.
These stale bitmaps can cause the merge
operation to fail due to 'No space left on device'.
In this case, qemu does not end with error, so
the failure goes unnoticed.
As there is not a reliable way to measure
the size of these stale bitmaps, and they are
invalid and can never be used for incremental
backup, it is better to prune them to avoid
the error.
Related: #352
Signed-off-by: Albert Esteve aesteve@redhat.com