Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cold merge: prune stale bitmaps in base volume #354

Merged
merged 3 commits into from
Nov 28, 2022

Conversation

aesteve-rh
Copy link
Member

@aesteve-rh aesteve-rh commented Nov 17, 2022

Prune the stale base volume bitmaps during
the prepare step on a merge operation.

These stale bitmaps can cause the merge
operation to fail due to 'No space left on device'.
In this case, qemu does not end with error, so
the failure goes unnoticed.

As there is not a reliable way to measure
the size of these stale bitmaps, and they are
invalid and can never be used for incremental
backup, it is better to prune them to avoid
the error.

Related: #352
Signed-off-by: Albert Esteve aesteve@redhat.com

@aesteve-rh aesteve-rh self-assigned this Nov 17, 2022
@aesteve-rh aesteve-rh added bug Issue is a bug or fix for a bug storage labels Nov 17, 2022
@aesteve-rh aesteve-rh force-pushed the aesteve/livemerge-size branch 4 times, most recently from ec39541 to 41418ec Compare November 18, 2022 08:33
@michalskrivanek
Copy link
Member

/ost

1 similar comment
@aesteve-rh
Copy link
Member Author

/ost

@nirs
Copy link
Member

nirs commented Nov 18, 2022

Measure also the base volume when prepare for a merge. We need to consider base volume bitmaps size, since measuring only top may result in untracked bitmaps even with backing enabled.

This is not very clear, maybe explain that measuring only top does not consider the size
required for the bitmaps in the base image, which may result in failing to finish the commit
or to copy bitmaps from the top volume.

This can happen if the bitmaps were added to base after the top volume was created.

This cannot happen in the current system. I think the reason is leftover bitmaps that
were not deleted when a checkpoint was deleted - known issue in engine.

Considering both base and top volume bitmaps may result in allocating more than needed, but will avoid errors when commiting the image due to 'No space left on device'.

Or when copying bitmaps from top to base after the commit finish.

The expected cases are:

  1. Bitmap is in both top and base - measuring only top is enough
  2. Bitmap is only in top - measuring only top is good enough

The case we see in the related bug is bitmap that existing in base but not in top.
These bitmaps are broken and can never be used for backup, but they still take space
in the image and can cause live or cold merge to fail.

@aesteve-rh
Copy link
Member Author

aesteve-rh commented Nov 18, 2022

Understood. I will update the comments and try to explain it better.

I think the reason is leftover bitmaps that
were not deleted when a checkpoint was deleted - known issue in engine.

ouch, yes, that could lead to this issue.

These bitmaps are broken and can never be used for backup, but they still take space
in the image and can cause live or cold merge to fail.

More so in the cold merge, as the live merge adds an entire extra chunk to avoid pausing if the guest writes data, which will be enough to "hide" the extra size of the untracked bitmaps.

It is not ideal that we potentially over extend unnecessarily the volume (as in the common case we potentially double the space required for bitmaps), but is better than having many reports due to this buggy scenario occurring.

tests/storage/blocksd_test.py Show resolved Hide resolved
tests/storage/blocksd_test.py Outdated Show resolved Hide resolved
tests/storage/blocksd_test.py Show resolved Hide resolved
tests/storage/blocksd_test.py Outdated Show resolved Hide resolved
tests/storage/blocksd_test.py Outdated Show resolved Hide resolved
lib/vdsm/storage/merge.py Outdated Show resolved Hide resolved
tests/storage/merge_test.py Outdated Show resolved Hide resolved
tests/storage/merge_test.py Outdated Show resolved Hide resolved
lib/vdsm/virt/livemerge.py Outdated Show resolved Hide resolved
tests/virt/livemerge_test.py Outdated Show resolved Hide resolved
@nirs
Copy link
Member

nirs commented Nov 19, 2022

Testing actual merge code is very complicated and requires too much mocking and faking and
is very hard to work with. I tried to reproduce the failure with the tests but could not
reproduce it.

I wrote a simple reproducer using qemu-img commands that reproduces the issue, but it proves that the computation of require size using top and base bitmaps is not safe, so this is not the right fix.

  • Running the reproducer with 5 stale bitmaps the required size is less than the lv size, the merge succeeds, all bitmaps are valid, merging the bitmaps from top succeeds, and the good bitmap is valid.

  • Running the reproducer with 6 stale bitmaps the required size is less than the lv size, the merge succeeds, merging the bitmaps from top succeeds, but the good bitmap is invalid.

Commit top into base
    (100.00/100%)
Image committed.
Add good bitmap to base
Merge good bitmap from top to base
qcow2_free_clusters failed: No space left on device
qemu-img: Lost persistent bitmaps during inactivation of node '#block159': Failed to write bitmap 'good-bitmap' to file: No space left on device
qemu-img: Failed to flush the refcount block cache: No space left on device
Info base after merge
{
  "virtual-size": 1073741824,
  "filename": "/dev/mapper/storage-base.qcow2",
  "cluster-size": 65536,
  "format": "qcow2",
  "actual-size": 0,
  "format-specific": {
    "type": "qcow2",
    "data": {
      "compat": "1.1",
      "compression-type": "zlib",
      "lazy-refcounts": false,
      "bitmaps": [
        ...
        {
          "flags": [
            "in-use",
            "auto"
          ],
          "name": "good-bitmap",
          "granularity": 65536
        }
      ],
      "refcount-bits": 16,
      "corrupt": false,
      "extended-l2": false
    }
  },
  "dirty-flag": false
}
  • Running the reproducer with 11 stale bitmaps, the required size is exactly the lv size and the merge succeeds, merging the good bitmap success, and the good bitmap from top is valid.
Commit top into base
    (100.00/100%)
qcow2_free_clusters failed: No space left on device
qemu-img: Lost persistent bitmaps during inactivation of node '#block308': Failed to write bitmap 'stale-bitmap-002' to file: No space left on device
qemu-img: Failed to flush the refcount block cache: No space left on device
Image committed.
Add good bitmap to base
Merge good bitmap from top to base
Info base after merge
{
  "virtual-size": 1073741824,
  "filename": "/dev/mapper/storage-base.qcow2",
  "cluster-size": 65536,
  "format": "qcow2",
  "actual-size": 0,
  "format-specific": {
    "type": "qcow2",
    "data": {
      "compat": "1.1",
      "compression-type": "zlib",
      "lazy-refcounts": false,
      "bitmaps": [
        ...
        {
          "flags": [
            "auto"
          ],
          "name": "good-bitmap",
          "granularity": 65536
        }
      ],
  • Running the reproducer with 13 bitmaps, the required size is more than the lv size, but the merge succeeds. The good bitmap is invalid.
Commit top into base
    (100.00/100%)
qcow2_free_clusters failed: No space left on device
qemu-img: Lost persistent bitmaps during inactivation of node '#block326': Failed to write bitmap 'stale-bitmap-000' to file: No space left on device
qemu-img: Failed to flush the refcount block cache: No space left on device
Image committed.
Add good bitmap to base
Merge good bitmap from top to base
Info base after merge
{
  "virtual-size": 1073741824,
  "filename": "/dev/mapper/storage-base.qcow2",
  "cluster-size": 65536,
  "format": "qcow2",
  "actual-size": 0,
  "format-specific": {
    "type": "qcow2",
    "data": {
      "compat": "1.1",
      "compression-type": "zlib",
      "lazy-refcounts": false,
      "bitmaps": [
        ...
        {
          "flags": [
            "auto"
          ],
          "name": "good-bitmap",
          "granularity": 65536
        }
      ],
  • Running the reproducer with 14 bitmaps the merge fails with ENOSPC.

This shows that we need more size than the size than top and base bitmaps and top required,
but we don't have a good way to compute this value.

Since the issue is stale bitmaps that we cannot use for anything, we can can fix the issue
by deleting all bitmaps in base that do not appear in top before the merge before the merge.

See how we find bitmaps in bitmaps._query_bitmaps(). We can add

bitmaps.prune_bitmaps(top, base)

that delete invalid bitmaps from base, and will be used before we measure base in cold
and live merge, and maybe later in a new repair_bitmaps vdsm command.

Regardless of the vdsm fix this reproducer reveals issues in qemu-img commit and
qemu-img bitmap --merge. The commands must fail if they cannot write bitmaps
to storage and they leave them with the "in-use" bit. This is a major failure for
qemu-img bitamap --merge since its purpose is to manipulate bitmaps. You can use
the reproducer to report qemu-img bug.

Reproducer

# Reproduce ENOSPC error when merging into base image with lot of bitmaps.

import json
import subprocess

IMG_SIZE = 1 << 30
LV_SIZE = 128 << 20

# Testing shows that we can merge successfully with 13 bitmaps, and require
# size calculation is more strict, allowing up to 11 bitamps.
STALE_BITMAPS = 11

def run(*cmd):
    subprocess.run(cmd, check=True)

def output(cmd):
    cp = subprocess.run(cmd, stdout=subprocess.PIPE, check=True)
    return json.loads(cp.stdout)

def info(img):
    return output(["qemu-img", "info", "--output", "json", img])

def measure(img):
    return output(["qemu-img", "measure", "-f", "qcow2", "-O", "qcow2", "--output", "json", img])

def check(img):
    cmd = ["qemu-img", "check", "-f", "qcow2", "--output", "json", img]
    cp = subprocess.run(cmd, stdout=subprocess.PIPE)
    if cp.returncode not in (0, 3):
        raise RuntimeError(f"Check failed")

    return json.loads(cp.stdout)

def indent(info):
    return json.dumps(info, indent=2)

# Configuration - you need to create these lvs and chown them to the user
# running this script.
#   sudo lvcreate --name base.qcow2 --size 128m storage
#   sudo chown $USER:$USER /dev/mappe/stoarge-base.qcow2
#   sudo lvcreate --name top.qcow2 --size 128m storage
#   sudo chown $USER:$USER /dev/mappe/stoarge-top.qcow2
base = "/dev/mapper/storage-base.qcow2"
top = "/dev/mapper/storage-top.qcow2"

# Start with clean lvs by discarding current data.
run("blkdiscard", base)
run("blkdiscard", top)

print("Creating base")
run("qemu-img", "create", "-f", "qcow2", base, str(IMG_SIZE))

# Simulate stale bitmaps - missing in top.
for i in range(STALE_BITMAPS):
    bitmap = f"stale-bitmap-{i:03d}"
    print(f"Creating stale bitmap {bitmap}")
    run("qemu-img", "bitmap", "--add", base, bitmap)

print("Info base before merge")
base_info = info(base)
print(indent(base_info))

print("Check base before merge")
base_check = check(base)
print(indent(base_check))

print("Measure base before merge")
base_measure = measure(base)
print(indent(base_measure))

print("Creating top")
run("qemu-img", "create", "-f", "qcow2", "-b", base, "-F", "qcow2", top)

print("Adding good bitmap to top")
run("qemu-img", "bitmap", "--add", top, "good-bitmap")

print("Writing data to top")
cmd = f"write -P {ord('B')} 0 126m"
run("qemu-io", "-f", "qcow2", "-c", cmd, top)

print("Info top before merge")
top_info = info(top)
print(indent(top_info))

print("Check top before merge")
top_check = check(top)
print(indent(top_check))

print("Measure top before merge")
top_measure = measure(top)
print(indent(top_measure))

print("Commit top into base")
run("qemu-img", "commit", "-f", "qcow2", "-t", "none", "-b", base, "-d", "-p", top)

print("Add good bitmap to base")
run("qemu-img", "bitmap", "--add", base, "good-bitmap")

print("Merge good bitmap from top to base")
run("qemu-img", "bitmap", "--merge", "good-bitmap", "-F", "qcow2", "-b", top,  base, "good-bitmap")

print("Info base after merge")
print(indent(info(base)))

print("Check base after merge")
print(indent(check(base)))

print("Measure base after merge")
print(indent(measure(base)))

print("Checking that required size for merge is correct")
required_size = top_measure["required"] + top_measure["bitmaps"] + base_measure["bitmaps"]
print(f"required_size: {required_size} lv_size: {LV_SIZE}")
assert required_size <= LV_SIZE

@nirs
Copy link
Member

nirs commented Nov 19, 2022

POC of a fix, using:

  • Good bitmaps that are in both base and top
  • Good bitmap that is only in top (e.g. bitmap created in backup snapshot)
  • Stale bitmaps only in base

After the merge, base should have only the good bitmaps. If we add a merge test
we can simulate this case in the test.

# Reproduce ENOSPC error when merging into base image with lot of bitmaps.

import json
import subprocess

IMG_SIZE = 1 << 30
LV_SIZE = 128 << 20

STALE_BITMAPS = 32

def run(*cmd):
    subprocess.run(cmd, check=True)

def output(cmd):
    cp = subprocess.run(cmd, stdout=subprocess.PIPE, check=True)
    return json.loads(cp.stdout)

def info(img):
    return output(["qemu-img", "info", "--output", "json", img])

def measure(img):
    return output(["qemu-img", "measure", "-f", "qcow2", "-O", "qcow2", "--output", "json", img])

def check(img):
    cmd = ["qemu-img", "check", "-f", "qcow2", "--output", "json", img]
    cp = subprocess.run(cmd, stdout=subprocess.PIPE)
    if cp.returncode not in (0, 3):
        raise RuntimeError(f"Check failed")

    return json.loads(cp.stdout)

def bitmaps(img):
    img_info = info(img)
    return {b["name"] for b in img_info["format-specific"]["data"].get("bitmaps", [])}

def indent(info):
    return json.dumps(info, indent=2)

# Configuration - you need to create these lvs and chown them to the user
# running this script.
#   sudo lvcreate --name base.qcow2 --size 128m storage
#   sudo chown $USER:$USER /dev/mappe/stoarge-base.qcow2
#   sudo lvcreate --name top.qcow2 --size 128m storage
#   sudo chown $USER:$USER /dev/mappe/stoarge-top.qcow2
base = "/dev/mapper/storage-base.qcow2"
top = "/dev/mapper/storage-top.qcow2"

# Start with clean lvs by discarding current data.
run("blkdiscard", base)
run("blkdiscard", top)

print("Creating base")
run("qemu-img", "create", "-f", "qcow2", base, str(IMG_SIZE))

# Simulate stale bitmaps - missing in top.
for i in range(STALE_BITMAPS):
    bitmap = f"stale-bitmap-{i:03d}"
    print(f"Creating stale bitmap {bitmap}")
    run("qemu-img", "bitmap", "--add", base, bitmap)

print("Creating top")
run("qemu-img", "create", "-f", "qcow2", "-b", base, "-F", "qcow2", top)

# Add good bitmaps that will be in both base and top.
for i in range(2):
    bitmap = f"good-bitmap-{i:03d}"
    print(f"Adding good bitmap {bitmap} to base and top")
    run("qemu-img", "bitmap", "--add", base, bitmap)
    run("qemu-img", "bitmap", "--add", top, bitmap)

# Add bitmpas that exists only in top - will be added to based after the merge.
print(f"Adding top only bitmap to top")
run("qemu-img", "bitmap", "--add", top, "top-only")

print("Writing data to top")
cmd = f"write -P {ord('B')} 0 126m"
run("qemu-io", "-f", "qcow2", "-c", cmd, top)

print("Info base before merge")
base_info = info(base)
print(indent(base_info))

print("Info top before merge")
top_info = info(top)
print(indent(top_info))

print("Measure top before merge")
top_measure = measure(top)
print(indent(top_measure))

for b in sorted(bitmaps(base) - bitmaps(top)):
    print(f"Removing stale bitmap {b} from base")
    run("qemu-img", "bitmap", "--remove", base, b)

print("Commit top into base")
run("qemu-img", "commit", "-f", "qcow2", "-t", "none", "-b", base, "-d", "-p", top)

# Merging bitmaps only in top to base.
for b in sorted(bitmaps(top) - bitmaps(base)):
    print("Add top only bitmap to base")
    run("qemu-img", "bitmap", "--add", base, b)
    print("Merge top only bitmap from top to base")
    run("qemu-img", "bitmap", "--merge", b, "-F", "qcow2", "-b", top,  base, b)

print("Info base after merge")
print(indent(info(base)))

print("Check base after merge")
print(indent(check(base)))

print("Measure base after merge")
print(indent(measure(base)))

print("Checking that required size for merge is correct")
required_size = top_measure["required"] + top_measure["bitmaps"]
print(f"required_size: {required_size} lv_size: {LV_SIZE}")
assert required_size <= LV_SIZE

Add merge test in block volumes, to have
a realistic scenario with volumes and bitmaps.

When bitmaps are added to the base volume
after the top volume is create, measuring
the top volume will not consider the base
bitmaps size, even with backing enabled.

This will result in not performing an
extend and thus, failing the merge process
with `Failed to write bitmap 'xxx' to file:
No space left on device`.

This test has been used to reproduce the error
in https://bugzilla.redhat.com/2141371

Signed-off-by: Albert Esteve <aesteve@redhat.com>
@aesteve-rh aesteve-rh force-pushed the aesteve/livemerge-size branch 2 times, most recently from 865519c to b61c9d1 Compare November 23, 2022 15:58
tests/storage/bitmaps_test.py Show resolved Hide resolved
tests/storage/bitmaps_test.py Outdated Show resolved Hide resolved
lib/vdsm/storage/bitmaps.py Outdated Show resolved Hide resolved
lib/vdsm/storage/bitmaps.py Outdated Show resolved Hide resolved
lib/vdsm/storage/bitmaps.py Outdated Show resolved Hide resolved
lib/vdsm/storage/sdm/api/merge.py Outdated Show resolved Hide resolved
lib/vdsm/virt/livemerge.py Outdated Show resolved Hide resolved
lib/vdsm/virt/livemerge.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vm.py Outdated Show resolved Hide resolved
lib/vdsm/virt/vmdevices/storage.py Outdated Show resolved Hide resolved
@aesteve-rh aesteve-rh force-pushed the aesteve/livemerge-size branch 2 times, most recently from 5cdb4c9 to 86e7d51 Compare November 24, 2022 09:34
@aesteve-rh aesteve-rh changed the title merge: measure base volume bitmaps cold merge: prune stale bitmaps in base volume Nov 24, 2022
@aesteve-rh
Copy link
Member Author

/ost

@aesteve-rh aesteve-rh requested a review from nirs November 24, 2022 10:31
Copy link
Member

@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, suggested minor tweaks.

lib/vdsm/storage/bitmaps.py Outdated Show resolved Hide resolved
lib/vdsm/storage/bitmaps.py Outdated Show resolved Hide resolved
tests/storage/bitmaps_test.py Outdated Show resolved Hide resolved
tests/storage/bitmaps_test.py Outdated Show resolved Hide resolved
Add prune_bitmaps function, to remove all stale
bitmaps from a base volume (that are missing from
the top volume).

Signed-off-by: Albert Esteve <aesteve@redhat.com>
Prune the stale base volume bitmaps during
the prepare step on a merge operation.

These stale bitmaps can cause the merge
operation to fail due to 'No space left on device'.
In this case, qemu does not end with error, so
the failure goes unnoticed.

As there is not a reliable way to measure
the size of these stale bitmaps, and they are
invalid and can never be used for incremental
backup, it is better to prune them to avoid
the error.

Related: oVirt#352
Signed-off-by: Albert Esteve <aesteve@redhat.com>
@aesteve-rh
Copy link
Member Author

/ost

4 similar comments
@aesteve-rh
Copy link
Member Author

/ost

@aesteve-rh
Copy link
Member Author

/ost

@aesteve-rh
Copy link
Member Author

/ost

@aesteve-rh
Copy link
Member Author

/ost

@aesteve-rh
Copy link
Member Author

/ost

3 similar comments
@aesteve-rh
Copy link
Member Author

/ost

@aesteve-rh
Copy link
Member Author

/ost

@michalskrivanek
Copy link
Member

/ost

@nirs nirs merged commit 6f67a73 into oVirt:master Nov 28, 2022
@aesteve-rh aesteve-rh deleted the aesteve/livemerge-size branch November 28, 2022 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is a bug or fix for a bug storage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants