Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Bundling snapshot repository plugins with OpenSearch #2194

Closed
setiah opened this issue Jun 14, 2022 · 9 comments
Closed

[Discuss] Bundling snapshot repository plugins with OpenSearch #2194

setiah opened this issue Jun 14, 2022 · 9 comments
Labels
enhancement New Enhancement

Comments

@setiah
Copy link
Contributor

setiah commented Jun 14, 2022

Is your feature request related to a problem? Please describe

I'd like to discuss the idea of bundling snapshot repository plugins - repository-s3, repository-azure, repository-gcp with the OpenSearch distribution. Today, these plugins are not part of the distribution. Users need to install them explicitly after installing OpenSearch if they plan to use any of these popular cloud blob storage as their snapshot repository. The only out of the box snapshot repository option available for OpenSearch users is shared filesystem.

Given that Snapshot and restore is a core functionality for OpenSearch, I'd like to discuss

  1. If the shared filesystem alone is a good out of the box option for Snapshots/Restore? OR
  2. Should OpenSearch provide the popular blob storage plugins - repository-s3, repository-azure, repository-gcp pre-installed as part of the OpenSearch distribution for more Snapshot/Restore options?

The upside of having these plugins pre-installed is that these popular repositories options are available out of the box for users and can be directly integrated with the Snapshot management UI that is coming soon. The downside is having extra plugins in the distribution that may not be used for example - if someone uses only s3 for snapshot, they might not want the other snapshot repo plugins installed.

Appreciate your feedback.

Describe the solution you'd like

Seeking feedback on providing snapshot repository plugins - repository-s3, repository-azure, repository-gcp pre-installed as part of the OpenSearch distribution for Snapshots and Restore functionality.

Describe alternatives you've considered

No response

Additional context

The upcoming Snapshot Management feature in OpenSearch enables users manage their OpenSearch Snapshots easily via a consolidates user interface on Dashboards. Having these repository plugins pre-installed would allow the Snapshot Management UI to provide popular Snapshot repository options (s3, azure, ...) out of the box.

@setiah setiah added enhancement New Enhancement untriaged Issues that have not yet been triaged labels Jun 14, 2022
@bbarani
Copy link
Member

bbarani commented Jun 14, 2022

We do not self-install the native plugins present under /OpenSearch/tree/main/plugins directory on the distribution builds (by default) but it can be installed using the plugin install command as below (and this is by design).

opensearch-plugin install analysis-kuromoji

You can also download the zips directly using the below link (pattern)

https://artifacts.opensearch.org/releases/plugins/analysis-kuromoji/1.3.3/analysis-kuromoji-1.3.3.zip

There is a plan to execute integration tests against all the native plugins as part of the gradle check and its tracked here.

@setiah Self installing these plugins might cause additional config changes along with increase in the distribution size for everyone. Is there a specific use case that this proposal is trying to solve that cannot be solved by installing the required plugins after downloading the distribution?

@bbarani bbarani removed the untriaged Issues that have not yet been triaged label Jun 14, 2022
@setiah
Copy link
Contributor Author

setiah commented Jun 15, 2022

First off, this is not a proposal but a discussion :)

During the Snapshot management feature review, it was found that the only out of the box option for Snapshots is shared filesystem. Apparently, the assumption was repository-s3 (and other snapshot plugins) are pre-installed in OpenSearch distribution, (since they are part of the same codebase) and can be supported via the new Snapshot Management UI directly. However, this is not the case. This led to a floating discussion if these plugins should be part of the default distribution and hence I opened this issue to have a more meaningful conversation on +/- in public.

My view is it should not be part of the distribution. The rationale behind this is not all repository plugins (s3, gcp, azure, etc.) are used by everyone as most likely a user would use only one of them for Snapshots. Packaging them all together with OpenSearch would just bloat up the distribution size, which we should avoid. Either of these can be installed based on the needs and are fine as is as optional plugins.

@CarlMeadows
Copy link

Whether or not they are bundled with distribution - I assume the snapshot management UI can render the correct options based on which plugins are present correct? For example, if the S3 plugin is installed - provide the options to create a repo in S3 and do the same things for Azure, GCP, Oracle etc. As long as that is the case it should be fine to not include them in the distribution. We should make the snapshot management feature easy to extend as more storage plugins become available (even if those plugins are not part of the OpenSearch project org). For example, if Pure Storage created a plugin, it should be simple for that Plugin to be integrated into the snapshot management feature.

@setiah
Copy link
Contributor Author

setiah commented Jun 16, 2022

Thanks @CarlMeadows . Good to hear your thoughts.

Whether or not they are bundled with distribution - I assume the snapshot management UI can render the correct options based on which plugins are present correct? For example, if the S3 plugin is installed - provide the options to create a repo in S3 and do the same things for Azure, GCP, Oracle etc.

Yes, it can but since none of these storage plugins (s3, azure, gcp etc.) are part of the distribution, it will just show shared filesystem as the only snapshot repo option available out of the box. The gap in experience I see with this is "discovery" for new users. It would be probably better for the UI to show common storage options like S3, Azure, GCP in a dropdown (even if they are not installed on the cluster) and guide users through how to install the respective plugin and setup respective repository through UI.

We should make the snapshot management feature easy to extend as more storage plugins become available (even if those plugins are not part of the OpenSearch project org). For example, if Pure Storage created a plugin, it should be simple for that Plugin to be integrated into the snapshot management feature.

100%! The Snapshot management UI should be designed in a way that it is simple to integrate any storage plugins. One way to do that is while creating the snapshot repository, have a drop down option for the storage on the UI that provides multiple options and based on the user choice, it renders the input form required to setup repository for that storage. This makes it easily extensible for more storage types in future where adding a new storage option would mean adding the option to dropdown and creating an input form on the UI.

Overall, seems like we're inclined towards not bundling these storage plugins together with distribution.

cc: @elfisher

@dbwiddis
Copy link
Member

It would be probably better for the UI to show common storage options like S3, Azure, GCP in a dropdown (even if they are not installed on the cluster) and guide users through how to install the respective plugin and setup respective repository through UI.

+1 on this. Also make sure that the process for adding other cloud providers (e.g., Alibaba, IBM, Oracle, et. al.) is easy for those providers to update.

@elfisher
Copy link
Contributor

It would be probably better for the UI to show common storage options like S3, Azure, GCP in a dropdown (even if they are not installed on the cluster) and guide users through how to install the respective plugin and setup respective repository through UI.

Is there a way we could make new storage options loadable via a REST API? It could help make it easier for a user to enable a new storage provider without needing to restart their cluster.

@setiah
Copy link
Contributor Author

setiah commented Jun 16, 2022

It would be probably better for the UI to show common storage options like S3, Azure, GCP in a dropdown (even if they are not installed on the cluster) and guide users through how to install the respective plugin and setup respective repository through UI.

Is there a way we could make new storage options loadable via a REST API? It could help make it easier for a user to enable a new storage provider without needing to restart their cluster.

Unfortunately not at this point because of the plugin architecture. If you install a plugin, you need to do a rolling restart to let the cluster know because the plugin is loaded at the time of bootstrapping the opensearch process. Allowing hot install of plugins is a separate project in itself. Extensions should solve this problem in the long run.

For the time being, I think the UI could describe the process so users can do it themselves. Something like -

  1. User chooses s3 for Snapshot repository
  2. UI identifies the plugin is not installed via _cat/plugins API and reverts back with "steps to install"
  3. Steps to install look like -
    i. Run bin/opensearch-plugin install repository-s3 on all nodes to install the S3 repository
    ii. Restart the opensearch process one at a time on all nodes.
    iii. Reload this UI next and register a repository in s3 for snapshots.

@elfisher
Copy link
Contributor

If you install a plugin, you need to do a rolling restart to let the cluster know because the plugin is loaded at the time of bootstrapping the opensearch process.

I'm not sure I really understand the downside to pre-installing some of them. Does it create performance impact? Without any of them installed would the UI just use local storage? With that said, I agree, if it doesn't include any pre-installed it should definitely walk through the process and link to detailed docs.

@setiah
Copy link
Contributor Author

setiah commented Jun 16, 2022

It doesn't create performance impact, but there are a couple that I can think of. First, it slightly bloats up the distribution size. Bundling these storage options (repository-s3, repository-azure, repository-gcs) together, roughly increases the tar distribution size by +6-8%. Second, not all storage options are applicable for everyone e.g. users using s3, may not want azure and gcs storage plugins installed. Third, it opens another question - which storage options to bundle and which ones to not, as there could be plenty available or upcoming - Alibaba, IBM, Oracle, etc. Forth, it could create additional friction in the release process as a new version release would require each of these plugins to be ready for bundling.

@setiah setiah closed this as completed Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New Enhancement
Projects
None yet
Development

No branches or pull requests

5 participants