Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Accelerate End-of-Life of 1.x #4335

Open
seanneumann opened this issue Jun 6, 2023 · 9 comments
Open

[Proposal] Accelerate End-of-Life of 1.x #4335

seanneumann opened this issue Jun 6, 2023 · 9 comments
Assignees

Comments

@seanneumann
Copy link
Contributor

Proposal

What? Accelerating the end-of-life (EOL) of OpenSearch and OpenSearch Dashboards 1.x.

Why? The ability to keep these older versions secure is becoming intractable with our maintenance promise and adherence to semantic versioning.

Vulnerability and Maintenance Policies

The OpenSearch Project aspires to adhere to OpenSSF’s best practices for publicly known vulnerabilities, which states “there MUST be no unpatched vulnerabilities of medium or higher severity that have been publicly known for more than 60 days.” For all software in our project, active development, new features, and security fixes take place in our newest major/minor version (e.g. 2.x). Older major versions (e.g. 1.x) are put into maintenance. Our policy states that, by default, versions will remain under maintenance until the next major version enters maintenance, or 1 year passes, whichever is longer. In the case of 1.x, this means it will remain in maintenance until 12/31/2023. Any vulnerabilities addressed in 2.x are backported to 1.x... unless they introduce a breaking change.

Semantic Versioning (SemVer) and Breaking Changes

The OpenSearch project follows the semantic versioning specification for assigning version numbers to releases, so users can upgrade to the latest minor version of that same major version of the software without encountering incompatible changes (e.g., 1.1.0 → 1.3.8). When we need to introduce incompatible changes out of necessity, we bump the major version. For example, OpenSearch Dashboards upgraded Node.js from 10 to 14 to resolve a long list of security issues. This change broke some functionality to our downstream plugins which required them to make changes. This breaking change bumped Dashboards to 2.0.

Problem

The ability to keep these older versions secure is becoming intractable with our maintenance promise and adherence to semantic versioning. OpenSearch and OpenSearch Dashboards 1.3.x are in maintenance until December 31, 2023. We will continue to backport fixes to our 1.x line as long as it does not break any changes. It's worth noting that there is a lot of effort to ensure backwards compatibility which increases workload for project maintainers and slows down development in our latest branches.

Example of a problem... OpenSearch Dashboards 1.3.x, which runs on Node.js 10.x, contains issues that depend on a Node upgrade to be resolved. Node.js 10 support ended Apr 30, 2021. The team upgraded from Node 10 to 14 so we could address the issues and not depend on EOL software with no support. The upgrade was a breaking change and bumped us to 2.0. This means 1.3.x will never have the issues in Node 10 addressed.

Questions and Proposed Answers

Q: What should our policy be if we want to backport a critical/high/medium CVE fix that causes a breaking change?

A: Do not backport the fix. We should communicate through appropriate channels that the software is in a compromised state and encourage an upgrade to the latest patched version.

Pro: We stay true to SemVer providing predictable upgrades.
Con: This will break our maintenance promise.

Q: What should we do if our ability to keep the maintained version secure is intractable with our maintenance promise?

A: Update the maintenance window policy to include the ability to keep the software secure. For example...

By default, versions will remain under maintenance until the next major version enters maintenance or 1 year passes (whichever is longer), OR the ability to keep the software secure is intractable.

Q: What is the proposed EOL date?

A: TBD, but sooner than the current end of 2023.

Q: An early EOL will accelerate upgrades. What are the biggest challenges for users to upgrade from 1.x to 2.x?

Here are some examples we've seen (feel free to add more in the comments)...

  • The type parameter was removed from all OpenSearch API endpoints in version 2.0. For more information, see the breaking changes.
  • Kinesis Data Firehose currently doesn't support domains running version 2.x as a destination. For customers using Firehose, upgrading is simply broken for now.
  • If a domain contains any indexes that were originally created in Elasticsearch 6.8, those indexes are not compatible with OpenSearch 2.x and they must be reindexed.

Q: What is the backport burden for OpenSearch Dashboards 1.x?

Today, nearly all PRs are backported to 2.x. While feature PRs stop there, nearly every other PR (security bumps, bugfixes, doc updates, CI/test changes) need backports to both 1.x and 1.3. So any maintainer toil of dealing with merge conflicts and failed automation is often duplicated on the 1.x line. But the larger problem is that the CI and automation is built into the core dashboards codebase, and shares dependencies. So we essentially maintain two separate CI systems, one currently on node 14 (main, 2.x), and the other on Node 10. Paradoxically, the more effort we put into improving our CI and automation (in main), the more divergence and challenge we have to replicate those changes on the 1.x line.


Let's discuss the proposal in the comments below!

@seanneumann seanneumann changed the title [RFC] Accelerate End-of-Life of 1.x [Proposal] Accelerate End-of-Life of 1.x Jun 6, 2023
@dblock
Copy link
Member

dblock commented Jun 6, 2023

In what way was the node.js 10 upgrade a semver-breaking change?

In general, I think we may be conflating two problems here: 1) the inability to fix a security vulnerability without a breaking change, 2) the amount of work to do so. Let's put the amount of work to the side and figure out what "the proverbial right thing to do" would be? Assuming infinite amount of hands if we find ourselves in a situation where a vulnerability cannot be fixed without breaking semver, IMO we should fix the security vulnerability with the minimum amount of semver breakage because of a promise we made.

@seanneumann
Copy link
Contributor Author

Great question. Technically, OpenSearch Dashboards has not declared our public APIs, so there has been debate as to what is considered a breaking change. At the time of THAT Node upgrade, Dashboards and plugins were broken. There was a lot of refactoring. Anyone who had their own plugin may have been broken after upgrading to 2.0. RE: declaring our public APIs, that work is started via our SDK project (opensearch-project/opensearch-dashboards-sdk-js#41).

The right thing to do is empower OpenSearch admins with the ability to expediently upgrade their software to stay secure. I would argue providing a better 1.x to 2.x upgrade path is better for the customer than providing patches on older versions. Maintaining backwards compatibility is one of the many challenges there. I do agree there is something to be said about meeting the user where they are at and not forcing a major upgrade. If we want to prioritize security over not breaking changes, that is a perfectly good path, but we'll want to codify that. But we'll need to make a call on what lengths we want to go. For example, this year Dashboards will remove the Angular dependency for security reasons. Backporting that (11% of the codebase) does not really make sense.

@dblock
Copy link
Member

dblock commented Jun 7, 2023

I think we can settle the semver question, first. Taking a cue from OpenSearch, core Java classes aren't an API, so we've refactored and broke downstream plugins left and right as part of refactoring or dependency upgrades. Semver only applies to public runtime RESTful APIs.

Following semver continues to be terrible developer experience because minor versions require rebuilding plugins, but is a significant improvement to users that can reliably use clients against OpenSearch server range of versions. The solution to the developer problem is https://github.com/opensearch-project/opensearch-sdk-java/ just like you're proposing for Dashboards.

So now you're just saying that backporting security fixes (such as ripping out Angular) is a lot of work. It makes sense because we said we will. Other than "a lot of work", is there any reason why that doesn't make sense?

@seanneumann
Copy link
Contributor Author

I see your point. I think it makes sense to adopt the OpenSearch Core process and communicate loudly about what we're breaking. I also think it may make sense in this case to just take our latest 2.x branch and backport it to 1.3.x (plus removing a version check). We've done well in maintaining BWC. I'll chat more with the Dashboards team on this.

I'm still interested in if there are any non-Dashboards scenarios that this proposal may resonate with.

@ashwin-pc
Copy link
Member

ashwin-pc commented Jun 7, 2023

I agree with @dblock here. The first criteria for SemVer is that we declare a public API that we will not break, until we have that we are technically not following semver anyways and are free to make any breaking change. Now this is strictly following the definitions of semver. That being said I like the approach OS Engine has taken of limiting the scope of semver until we have an SDK that declares this API. Id say since this is related to OpenSearch Dashboards, we should folllow a similar approach there where we only declare the REST api's and plugin setup and start interfaces as Public API's, everything else is subject to change and is upto the plugin devs to update and keep in line. If this is too burdensome for plugins, it also incentivizes us to prioritize the SDK work to fix it. But this also takes some burden off OSD and keeps us from shipping code with known vulnerabilities under the petext of SemVer.

@peternied
Copy link
Member

peternied commented Jun 20, 2023

@seanneumann I do not see problems associated with the ongoing maintenance of OpenSearch, whereas those problems are specific to OpenSearch-Dashboards, should this issue to be moved to that repository?

@wbeckler
Copy link

let's move it.

@dblock dblock transferred this issue from opensearch-project/.github Jun 20, 2023
@ashwin-pc ashwin-pc pinned this issue Jun 27, 2023
@wbeckler wbeckler self-assigned this Jun 27, 2023
@wbeckler
Copy link

If the latest version of dashboards were compatible with the 1.x line of OpenSearch core, could we then say we've addressed CVEs in 1.x line once the 2.x line has resolved them? As in, the 2.x line is the next iteration of 1.x.

@wbeckler
Copy link

wbeckler commented Jul 2, 2023

Here's another way of asking the question of whether work on 2.x is the next iteration of 1.x: #4416

@wbeckler wbeckler unpinned this issue Jul 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants