Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pre-validation for Flint index refresh options #297

Merged
merged 12 commits into from
Apr 17, 2024

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Mar 29, 2024

Description

This PR introduces a pre-validation mechanism for Flint index operations. The primary goal here is to:

  1. Prevent the creation of dangling indexes due to the absence of pre-validation and;
  2. Ensure users receive clear messages about any issues encountered.

Detailed Design

The following design decisions were made to achieve these objectives:

  1. Scope of Pre-validation: Pre-validation is selectively applied to operations that modify Flint index metadata, such as the creation or update of an index. This approach is taken to avoid unnecessary overhead and potential backward compatibility issues.

  2. Location of Pre-validation Logic: After careful consideration, I decided to integrate the pre-validation process within the FlintSparkIndexBuilder class. This class serves as the entry point for both index creation and update workflows, making it an ideal place for implementing the validation logic.

  3. Implementation of Pre-validation: The pre-validation consists of two major aspects:

    • Flint index refresh specific validation: ensures that all refresh options are valid against the chosen refresh mode by new validate() API method in FlintSparkIndexRefresh.
    • Flint index specific validation [Future PR]: focuses on the integrity of the Flint index itself, ensuring that all the given attributes are valid by new internal validateIndex() method in FlintSparkIndexBuilder.

Screenshot 2024-04-05 at 2 45 01 PM

TODO

This PR is focused on improving pre-validation mechanisms specifically for Flint index refresh. A separate PR will be raised for introducing additional Flint index specific validations.

Examples

Example 1: Hive table validation

spark-sql> CREATE SKIPPING INDEX ON myglue.mydatabase.noaa_ghcn_pds
......             (id VALUE_SET)
......            WITH (
......                auto_refresh = true
......            );

24/04/05 20:40:42 ERROR SparkSQLDriver: Failed in [CREATE SKIPPING INDEX ON
myglue.mydatabase.noaa_ghcn_pds (id VALUE_SET) WITH (auto_refresh = true)]
java.lang.IllegalArgumentException: requirement failed: Index auto refresh doesn't support Hive table

Example 2: Checkpoint location validation

spark-sql> CREATE SKIPPING INDEX ON ds_tables.http_logs 
......             (clientip VALUE_SET)
......            WITH (
......                auto_refresh = true,
......                checkpoint_location = 's3a://test/123'
......            );

24/04/05 20:35:54 ERROR SparkSQLDriver: Failed in [CREATE SKIPPING INDEX ON ds_tables.http_logs
(clientip VALUE_SET) WITH (auto_refresh = true, checkpoint_location = 's3a://test/123')]
java.lang.IllegalArgumentException: requirement failed: Checkpoint location s3a://test/123 doesn't exist
or no permission to access

Issues Resolved

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen added enhancement New feature or request 0.4 labels Mar 29, 2024
@dai-chen dai-chen self-assigned this Mar 29, 2024
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen marked this pull request as ready for review April 5, 2024 21:51
@dai-chen dai-chen changed the title Improve Flint index validation Improve pre-validation for Flint index refresh options Apr 5, 2024
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen merged commit 9424a79 into opensearch-project:main Apr 17, 2024
4 checks passed
@dai-chen dai-chen deleted the improve-index-validation branch April 17, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.4 enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants