Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow non-existent checkpoint location path in index validation #313

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Apr 18, 2024

Description

The pre-validation of checkpoint locations, introduced in PR #297, requires the specified checkpoint path to exist. However, Spark is designed to automatically create necessary sub-folders when a streaming job begins. This PR is to relax this strict validation to enhance user convenience and ensure backward compatibility.

Testing

For checkpoint location without access permission, the behavior is the same as before:

spark-sql> CREATE SKIPPING INDEX ON ds_tables.http_logs 
...                (clientip VALUE_SET) 
...                WITH (
...                  auto_refresh = true,
...                  checkpoint_location = 's3://test/test'
...                );

java.lang.IllegalArgumentException: requirement failed: 
  No permission to access the checkpoint location s3://test/test

For checkpoint location with non-existent sub-folders, the validation can pass now and Spark streaming job creates it when start:

# validation-test folder doesn't exist
spark-sql> CREATE SKIPPING INDEX ON ds_tables.http_logs 
...                (clientip VALUE_SET) 
...                WITH (
...                  auto_refresh = true,
...                  checkpoint_location = 's3://daichen/validation-test'
...                );
Time taken: 11.97 seconds

# both validation-test and validation-test/subtest1 folder doesn't exist
spark-sql> CREATE SKIPPING INDEX ON ds_tables.http_logs 
...                (clientip VALUE_SET) 
...                WITH (
...                  auto_refresh = true,
...                  checkpoint_location = 's3://daichen/validation-test/subtest1'
...                );
Time taken: 12.978 seconds

Issues Resolved

#65

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen added bug Something isn't working 0.4 labels Apr 18, 2024
@dai-chen dai-chen self-assigned this Apr 18, 2024
@dai-chen dai-chen marked this pull request as ready for review April 18, 2024 22:30
Copy link
Collaborator

@seankao-az seankao-az left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change!

@dai-chen dai-chen merged commit b5ab7bd into opensearch-project:main Apr 19, 2024
6 checks passed
@dai-chen dai-chen deleted the fix-non-existing-checkpoint-location-support branch April 19, 2024 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.4 bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants