tests: fix upgrade tests not to skip versions #7310

jcsp · 2022-11-16T14:29:18Z

Where an upgrade test has an explicit starting version, it should not try to upgrade straight to HEAD -- we do not support skipping versions, and this risks missing problems that only occur when an intermediate release is used.

Fixing is probably some mixture of:

Change tests to upgrade from $lastrelease to $HEAD rather than starting from an older release than that
For tests that aim to check long-running system upgrades, ensure that upgrades are done in a series of hops through intermediate versions.

andrwng · 2022-11-16T18:00:08Z

Some upgrade tests start out pegging to a specific version as a means to test the onboarding behavior when going from a version that doesn't support feature Foo to a version that does support Foo. Depending on how the test is written, it may only make sense to ever test from a specific older version and below.

Perhaps our guidance should be to always write two kinds of upgrade test when writing a new feature:

One that is pegged at a version without the new feature, to test onboarding new features. Perhaps these can be culled from dev once a release happens.
One that continually runs, using the highest version prior to $HEAD as the version to upgrade from.

andrwng · 2022-11-30T01:59:15Z

We discussed this a bit and we concluded that moving forward we should categorize our existing upgrade tests:

Tests that only test the onboarding of a feature. Such tests are only meant to survive for a single version jump to the version the feature is introduced in. As such, it doesn't make sense for the test to continue on dev, and we should remove it from dev. We should maintain these tests in their respective feature version branches.
Tests that we should expect to run between any arbitrary single-version upgrade. For such tests we should convert the installer incantation to use highest_from_prior_feature_version() if they aren't already using it.

We also agreed that we do still want the ability to test several upgrades in a row, though without the cost of making all of our existing fixed-version feature-onboarding upgrade tests perform sequential upgrades. Instead, we should add a single test that starts at specific older versions (22.1, 22.2, etc) and runs through the sequential upgrades until dev, performing some smoke checks for various new features as we go. Developers of new features should consider adding basic checks that ensure their feature works on each version after the version that introduced the feature. Something along the lines of the following pseudo-code:

def sequential_upgrade_test():
    start with redpanda 22.1
    install redpanda 22.2
    run validation of features that should work in 22.2 (e.g. license check)
    install redpanda 22.3
    run validation of features that should work in 22.3 (e.g. license check)
    run validation of features that should work in 22.2 (e.g. transactions)
    install redpanda 23.1
    run validation of features that should work in 23.1 (e.g. controller snapshot)
    run validation of features that should work in 22.3 (e.g. license check)
    run validation of features that should work in 22.2 (e.g. transactions)
    ... and so on

It'd be great if the above could be made generic and easy to extend. It'll be expected that the test will grow an increasingly high runtime, hence the desire to have very few of these in the tree. We can revisit what versions to start with (or perhaps have variants of the test that start from different feature versions) as more versions fall out of the support window.

jcsp · 2022-12-01T17:53:09Z

@andijcr please can you work on this when #6843 is done

andijcr · 2022-12-01T18:01:52Z

@andijcr please can you work on this when #6843 is done

Sure

jcsp · 2022-12-16T16:37:10Z

Thinking about delivering this, I think there are probably ~3 PRs:

In the places where we aim to test a particular version hop (e.g. before/after a feature is introduced), amend the test to explicitly give the versions it wants to test, rather than hopping straight to the current revision
Where we want to do more general upgrade testing, ensure that we are stepping through intermediate versions
In that general upgrade case, slot in coverage of more features, creating some sort of system of hooks, where the test case steps through versions, but at each version pauses and calls checks/traffic generation hooks.

#7687 is a good example of the kind of thing we can gain coverage of with the (3) changes: stepping through each version, and at each version writing some data to a compacted topic, enough to fill a segment and cause an index to be written, and then also at each version waiting for some compaction to happen to ensure that the versions can handle each others' data.

This ticket doesn't require testing every possible feature, but it should add the framework for easily hooking in a new aspect to the upgrade test to cover a different feature.

andijcr · 2022-12-19T15:48:45Z

In the places where we aim to test a particular version hop (e.g. before/after a feature is introduced), amend the test to explicitly give the versions it wants to test, rather than hopping straight to the current revision

tackling this one at #7836

Instead of hardcoding all the "before" redpanda version, I added a function latest_for_line that retrieves the most recent version for a line.
The idea is to leave the hardcoded version only when the test needs to use a specific version

andijcr · 2022-12-21T09:53:33Z

In that general upgrade case, slot in coverage of more features, creating some sort of system of hooks, where the test case steps through versions, but at each version pauses and calls checks/traffic generation hooks.

for this one, I'm thinking of a function releases_sequence(first_release, last_release, list_of_releases_not_to_skip = []) that will accept a range of releases (or line of releases) and will produce a list of versions to install.
e.g.:

releases_sequence((22,1), (23,1), [(22,2,3), (22,3,1)]) -> 
    [
        last_of_22.1.x, 
        (22, 2,3), 
        last_of_22.2.x, 
        (22, 3,1), 
        last_of_22.3.x, 
        HEAD
    ]

then a base class

class RunUpgradeTest(Test):
    __init__(release_range, num_cluster)
    def do_pre_install(): # hook to run before install sequence begins
    def do_run_test(version): # hook called after installation of a version
    def test_run_upgrade():
        steps: releases_sequence(...)
        self.do_pre_install()
        for s: steps:
            success=install(s)
            assert success
            self.do_run_test(s)

that can be subclassed for simple tests that have to run after the whole cluster is upgraded.

For tests that need partial updates, it will probably need an intermediate hook that returns which node to update and which to rollback

jcsp added kind/tech-debt area/tests labels Nov 16, 2022

jcsp mentioned this issue Nov 16, 2022

controller: fix Redpanda 22.3 failing to start when license was loaded with 22.2 Redpanda #7312

Merged

6 tasks

piyushredpanda assigned ZeDRoman Nov 28, 2022

dlex mentioned this issue Nov 28, 2022

cluster_bootstrap_tests test upgrade 22.2 -> 22.3, not -> newest #7542

Closed

6 tasks

jcsp assigned andijcr and unassigned ZeDRoman Dec 1, 2022

jcsp mentioned this issue Dec 2, 2022

tests: update offline_log_viewer and use in ducktape tests #5315

Merged

andijcr mentioned this issue Jan 16, 2023

Ducktape test to execute workloads through multi-release upgrades #8253

Merged

6 tasks

ztlpn mentioned this issue Feb 1, 2023

Add parsing of v1 compaction index footers #8555

Merged

6 tasks

andijcr mentioned this issue Feb 15, 2023

tools: Update offline_log_viewer for serde-encoded controller logs & test it #5293

Closed

NyaliaLui mentioned this issue Mar 17, 2023

pp: basic auth upgrade tests #7227

Merged

7 tasks

jcsp closed this as completed in #8253 Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: fix upgrade tests not to skip versions #7310

tests: fix upgrade tests not to skip versions #7310

jcsp commented Nov 16, 2022

andrwng commented Nov 16, 2022

andrwng commented Nov 30, 2022

jcsp commented Dec 1, 2022

andijcr commented Dec 1, 2022 •

edited

Loading

jcsp commented Dec 16, 2022

andijcr commented Dec 19, 2022

andijcr commented Dec 21, 2022

tests: fix upgrade tests not to skip versions #7310

tests: fix upgrade tests not to skip versions #7310

Comments

jcsp commented Nov 16, 2022

andrwng commented Nov 16, 2022

andrwng commented Nov 30, 2022

jcsp commented Dec 1, 2022

andijcr commented Dec 1, 2022 • edited Loading

jcsp commented Dec 16, 2022

andijcr commented Dec 19, 2022

andijcr commented Dec 21, 2022

andijcr commented Dec 1, 2022 •

edited

Loading