This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

add AD task cache #337

Merged

ylwu-amzn merged 5 commits into opendistro-for-elasticsearch:master from ylwu-amzn:master

Dec 23, 2020

Contributor

ylwu-amzn commented Dec 20, 2020 •

edited

Loading

Issue #, if available:

Description of changes:

Add AD task cache. We will put RCF&threshold model, shingle data, threshold model training data in cache.

./gradlew build
./gradlew integTest -PnumNodes=3

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.


          add AD task cache

1bf147c

ylwu-amzn requested review from kaituo, yizheliu-amazon and weicongs-amazon

December 20, 2020 05:27

codecov bot commented Dec 20, 2020 •

edited

Loading

Codecov Report

Merging #337 (9088012) into master (2ae77ed) will increase coverage by 0.27%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master     #337      +/-   ##
============================================
+ Coverage     75.50%   75.77%   +0.27%     
- Complexity     2160     2207      +47     
============================================
  Files           207      209       +2     
  Lines         10030    10144     +114     
  Branches        898      902       +4     
============================================
+ Hits           7573     7687     +114     
  Misses         2035     2035              
  Partials        422      422

Flag	Coverage Δ	Complexity Δ
cli	`79.27% <ø> (ø)`	`0.00 <ø> (ø)`
plugin	`75.51% <100.00%> (+0.29%)`	`0.00 <46.00> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ	Complexity Δ
...stroforelasticsearch/ad/AnomalyDetectorPlugin.java	`95.04% <ø> (ø)`	`11.00 <0.00> (ø)`
...arch/ad/transport/ADStatsNodesTransportAction.java	`100.00% <ø> (+5.55%)`	`9.00 <0.00> (+1.00)`
...n/opendistroforelasticsearch/ad/MemoryTracker.java	`77.33% <100.00%> (+0.30%)`	`21.00 <0.00> (ø)`
...ch/ad/common/exception/LimitExceededException.java	`100.00% <100.00%> (ø)`	`3.00 <1.00> (+1.00)`
...ticsearch/ad/settings/AnomalyDetectorSettings.java	`100.00% <100.00%> (ø)`	`1.00 <0.00> (ø)`
...stroforelasticsearch/ad/task/ADBatchTaskCache.java	`100.00% <100.00%> (ø)`	`15.00 <15.00> (?)`
...roforelasticsearch/ad/task/ADTaskCacheManager.java	`100.00% <100.00%> (ø)`	`30.00 <30.00> (?)`
...asticsearch/ad/cluster/ADClusterEventListener.java	`88.00% <0.00%> (-4.00%)`	`13.00% <0.00%> (-1.00%)`
... and 2 more

weicongs-amazon reviewed

View reviewed changes

Contributor

weicongs-amazon left a comment

not sure how this task will be used.

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Outdated Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADBatchTaskCache.java Show resolved Hide resolved

weicongs-amazon approved these changes

View reviewed changes

Contributor

weicongs-amazon left a comment

LGTM

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADBatchTaskCache.java Show resolved Hide resolved

kaituo reviewed

View reviewed changes

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADBatchTaskCache.java Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADBatchTaskCache.java Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Outdated Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Outdated Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Outdated Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADBatchTaskCache.java Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Outdated Show resolved Hide resolved

ylwu-amzn added 2 commits

December 21, 2020 14:06


          add java doc for exception

aaced0b


          change to reserved memory

de29b8a

ylwu-amzn force-pushed the master branch from ce0654c to de29b8a Compare

December 21, 2020 22:23

kaituo reviewed

View reviewed changes

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Show resolved Hide resolved


          fix shingle memory calculation;store threshold model training data in…

d48182d

… double array

ylwu-amzn force-pushed the master branch from 43f77d8 to d48182d Compare

December 22, 2020 20:48

kaituo reviewed

View reviewed changes

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java

+                   * @param taskId task id
+                   * @param trained threshold model trained or not
+                   */
+                  protected void setThresholdModelTrained(String taskId, boolean trained) {

Member

kaituo Dec 23, 2020

when will you call this method? Threshold model can emit results even if it has only seen 1 rcf score. We use rcf's total updates to measure whether the models are ready or not. Simply put, threshold model is always trained. I wonder why we need a flag to set it trained or not.

Contributor Author

ylwu-amzn Dec 23, 2020 •

edited

Loading

This method will be called when historical detector's threshold model finishes training/cold start.

For historical detector, we have all of the data, so we should use these data as much as possible to train and predict better. In realtime detector, we have read some(512) historical sampled data to train model. In historical detector, we use more(1000) data points to train model. For both realtime and historical detectors, the RCF&Threshold model will be continuously trained with following data after cold start. Threshold model will be trained first, then start to output reliable results. So we need to know whether the threshold model is trained or not.

From ML team's suggestion that we should use enough data to train RCF model and they suggest 1000 should be good enough. We may tune this value after performance test and more data quality checking.

Member

kaituo Dec 23, 2020

Understood. 1000 is a trade-off between accuracy and usability. In real time, 128 is the threshold to emit results.

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Outdated Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Outdated Show resolved Hide resolved

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Outdated

+                      }
+                      checkRunningTaskLimit();
+                      long neededCacheSize = calculateADTaskCacheSize(adTask);
+                      if (!memoryTracker.canAllocate(neededCacheSize)) {

Member

kaituo Dec 23, 2020

you missed my comment

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java Outdated

+                      if (!memoryTracker.canAllocate(neededCacheSize)) {
+                          throw new LimitExceededException("No enough memory to run detector");
+                      }
+                      memoryTracker.consumeMemory(neededCacheSize, false, HISTORICAL_SINGLE_ENTITY_DETECTOR);

Member

kaituo Dec 23, 2020

you missed my comment


          address comments

kaituo reviewed

View reviewed changes

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java

+                   * {@link java.lang.IllegalArgumentException}
+                   * We throw exception rather than return {@code Optional.empty} or null
+                   * here, so don't need to check task existence by writing duplicate null
+                   * checking code. All AD task exceptions will be handled in AD task manager.

Member

kaituo Dec 23, 2020

I still suggest we remove the exceptions and use Optional or null. In Java, the convention is to always check conditions whenever possible and not to use exceptions for flow control. Conditional check is a jump in the byte code while the exception handling is much more complex. When an exception occurs inside a Java method, the method creates an Exception object and passes the Exception object to the JVM (in Java term, the method "throw" an Exception). The Exception object contains the type of the exception, and the state of the program when the exception occurs. The JVM is responsible for finding an exception handler to process the Exception object. It searches backward through the call stack until it finds a matching exception handler for that particular class of Exception object (in Java term, it is called "catch" the Exception). If the JVM cannot find a matching exception handler in all the methods in the call stack, it terminates the program.

If you don't want to do it, I am fine as well. Just a note this is not a good practice.

Ref: https://stackoverflow.com/questions/8161042/why-use-an-exception-instead-of-if-else

Contributor Author

ylwu-amzn Dec 24, 2020

Totally agree to not use exception for flow control. Here the exception is not to control the flow, but rather terminate the task run flow, just like any AD result action to terminate detector job run by throwing exceptions. In next PR I may change the method name to reduce the confusion.

Member

kaituo Dec 24, 2020

got it. Please change the method name or add comment to make it clearer.

src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java

+                   * @param taskId task id
+                   * @param trained threshold model trained or not
+                   */
+                  protected void setThresholdModelTrained(String taskId, boolean trained) {

Member

kaituo Dec 23, 2020

Understood. 1000 is a trade-off between accuracy and usability. In real time, 128 is the threshold to emit results.

kaituo approved these changes

View reviewed changes

ylwu-amzn merged commit d5683f6 into opendistro-for-elasticsearch:master

ohltyler added the enhancement label

ylwu-amzn added feature and removed enhancement labels

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

feature