Support refresh interval and checkpoint location option #39

dai-chen · 2023-09-20T18:31:09Z

Description

Added refresh_interval and checkpoint_location option and store it in Flint metadata
Fixed STRING token and deleteIndex() stop job first bug
Extract Spark parsing to SparkSqlAstBuilder and mix-in to Flint parser

TODO

Didn't find easy way to verify index option in streaming job. Will add UT in next PR for index setting option.
Refactoring Flint metadata and build index API #24: build API refactor will come first in next PR for MV.
~~Will update user manual with new index options in next PR.~~

Example

spark-sql> CREATE INDEX orderkey_and_quantity
         > ON stream.lineitem_tiny (l_orderkey, l_quantity)
         > WITH (
         >   auto_refresh = true,
         >   refresh_interval = '10 seconds',
         >   checkpoint_location = 's3a://test/'
         > );

GET flint_stream_lineitem_tiny_orderkey_and_quantity_index/_mapping
{
  "flint_stream_lineitem_tiny_orderkey_and_quantity_index": {
    "mappings": {
      "_meta": {
        "name": "orderkey_and_quantity",
        "options": {
          "auto_refresh": "true",
          "refresh_interval": "10 seconds",
          "checkpoint_location": "s3a://test/"
        },
        "source": "stream.lineitem_tiny",
        "kind": "covering",
        ...
    }
  }
}

spark-sql> 23/09/21 16:06:30 WARN ProcessingTimeExecutor: Current batch is falling behind. 
The trigger interval is 10000 milliseconds, but spent 72423 milliseconds

$ aws s3 ls test
     commits/
     offsets/
     sources/

Issues Resolved

#26

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>

integ-test/src/test/scala/org/opensearch/flint/spark/FlintSparkSkippingIndexITSuite.scala

Signed-off-by: Chen Dai <daichen@amazon.com>

docs/index.md

flint-spark-integration/src/main/scala/org/opensearch/flint/spark/FlintSpark.scala

Signed-off-by: Chen Dai <daichen@amazon.com>

integ-test/src/test/scala/org/opensearch/flint/spark/FlintSparkSkippingIndexSqlITSuite.scala

vamsi-amazon · 2023-09-26T21:18:14Z

flint-spark-integration/src/main/scala/org/opensearch/flint/spark/sql/SparkSqlAstBuilder.scala

+    }
+  }
+
+  override def visitPropertyValue(value: PropertyValueContext): String = {


Seems like every option value is treated like a string.
Few QQs:

If user passes some random strings in auto refresh. When will the failure occur?

[Not related to this PR] Any thoughts on communicating syntax and other errors to user

Yes, it's a map of string passed to FlintSparkIndexOptions which will interpret it.

For the error case, currently exception will be thrown when we build DataFrame for streaming job in FlintSpark. Let me think about if Spark error is clear enough or can we validate beforehand in Add index settings option in create statement #44. Thanks for the good point!

vamsi-amazon · 2023-09-26T21:21:09Z

LGTM

Add Flint index options and pass it from parser to index

8fc8ec1

Signed-off-by: Chen Dai <daichen@amazon.com>

dai-chen added the enhancement New feature or request label Sep 20, 2023

dai-chen self-assigned this Sep 20, 2023

dai-chen mentioned this pull request Sep 20, 2023

[Feature] OpenSearch and Apache Spark Integration #3

Closed

dai-chen changed the title ~~Support more options in create index statement~~ Support refresh interval and checkpoint location option in create statement Sep 20, 2023

dai-chen changed the title ~~Support refresh interval and checkpoint location option in create statement~~ Support refresh interval and checkpoint location option Sep 20, 2023

dai-chen added 5 commits September 20, 2023 16:39

Update more javadoc

519f1fb

Signed-off-by: Chen Dai <daichen@amazon.com>

Pass index option to Flint metadata and streaming job

fcaa6e1

Signed-off-by: Chen Dai <daichen@amazon.com>

Fix IT and add new IT

fdccca7

Signed-off-by: Chen Dai <daichen@amazon.com>

Add more IT

ed6ba7f

Signed-off-by: Chen Dai <daichen@amazon.com>

Update job id var name

fa1173e

Signed-off-by: Chen Dai <daichen@amazon.com>

dai-chen marked this pull request as ready for review September 25, 2023 15:34

dai-chen requested review from rupal-bq, vamsi-amazon, penghuo and anirudha as code owners September 25, 2023 15:34

dai-chen mentioned this pull request Sep 25, 2023

Support IF NOT EXISTS in create statement #42

Merged

Merge branch 'main' into add-checkpoint-location-option

f60c504

Signed-off-by: Chen Dai <daichen@amazon.com>

penghuo reviewed Sep 26, 2023

View reviewed changes

integ-test/src/test/scala/org/opensearch/flint/spark/FlintSparkSkippingIndexITSuite.scala Show resolved Hide resolved

Update doc with index options

0212b93

Signed-off-by: Chen Dai <daichen@amazon.com>

dai-chen requested a review from penghuo September 26, 2023 17:32

vamsi-amazon reviewed Sep 26, 2023

View reviewed changes

docs/index.md Outdated Show resolved Hide resolved

vamsi-amazon reviewed Sep 26, 2023

View reviewed changes

flint-spark-integration/src/main/scala/org/opensearch/flint/spark/FlintSpark.scala Show resolved Hide resolved

Update doc with default behavior

87417fc

Signed-off-by: Chen Dai <daichen@amazon.com>

dai-chen requested a review from vamsi-amazon September 26, 2023 21:08

vamsi-amazon reviewed Sep 26, 2023

View reviewed changes

integ-test/src/test/scala/org/opensearch/flint/spark/FlintSparkSkippingIndexSqlITSuite.scala Show resolved Hide resolved

vamsi-amazon reviewed Sep 26, 2023

View reviewed changes

vamsi-amazon approved these changes Sep 26, 2023

View reviewed changes

dai-chen merged commit 896fa9f into opensearch-project:main Sep 26, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support refresh interval and checkpoint location option #39

Support refresh interval and checkpoint location option #39

dai-chen commented Sep 20, 2023 •

edited

Loading

vamsi-amazon Sep 26, 2023 •

edited

Loading

dai-chen Sep 26, 2023 •

edited

Loading

vamsi-amazon commented Sep 26, 2023

Support refresh interval and checkpoint location option #39

Support refresh interval and checkpoint location option #39

Conversation

dai-chen commented Sep 20, 2023 • edited Loading

Description

TODO

Example

Issues Resolved

vamsi-amazon Sep 26, 2023 • edited Loading

Choose a reason for hiding this comment

dai-chen Sep 26, 2023 • edited Loading

Choose a reason for hiding this comment

vamsi-amazon commented Sep 26, 2023

dai-chen commented Sep 20, 2023 •

edited

Loading

vamsi-amazon Sep 26, 2023 •

edited

Loading

dai-chen Sep 26, 2023 •

edited

Loading