Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Fix for stats API #287

Merged
merged 4 commits into from
Oct 22, 2020
Merged

Conversation

kaituo
Copy link
Member

@kaituo kaituo commented Oct 22, 2020

Issue #, if available:

Description of changes:

This PR fixes two issues for the stats API.

First, we didn't propagate multi-entity detectors' models execution exceptions for the remote invocation.  This problem may impact stats' API ability to report the total failures count and thus hide an issue we should have reported during monitoring.  This PR fixes the issue by collecting model host nodes' exceptions from coordinating nodes.

Second, we didn't show active multi-entity detectors' models information on stats API.  This PR places this information into stats API output.

This PR also adds unit tests for ModelManager.

Testing done:

  1. added unit tests
  2. manually verified the two issues are resolved.

Stats API output after my changes:

{
	"anomaly_detectors_index_status": "green",
	"anomaly_detection_state_status": "green",
	"detector_count": 2,
	"anomaly_detection_job_index_status": "green",
	"models_checkpoint_index_status": "green",
	"anomaly_results_index_status": "green",
	"nodes": {
		"-7UJxzZjRbSJ3cKrpd-zoA": {
			"ad_execute_request_count": 0,
			"ad_execute_failure_count": 0,
			"models": [{
					"detector_id": "zQ2xTXUBNp5PN4GLXMXm",
					"model_type": "rcf",
					"model_id": "zQ2xTXUBNp5PN4GLXMXm_model_rcf_2"
				},
				{
					"detector_id": "kw2nTXUBNp5PN4GLucVB",
					"model_type": "entity",
					"model_id": "kw2nTXUBNp5PN4GLucVB_entity_app_0"
				},
				{
					"detector_id": "kw2nTXUBNp5PN4GLucVB",
					"model_type": "entity",
					"model_id": "kw2nTXUBNp5PN4GLucVB_entity_app_5"
				}
			]
		},
		"9TrCJH8IT5u79LHWJ4QrBQ": {
			"ad_execute_request_count": 0,
			"ad_execute_failure_count": 0,
			"models": [{
					"detector_id": "zQ2xTXUBNp5PN4GLXMXm",
					"model_type": "rcf",
					"model_id": "zQ2xTXUBNp5PN4GLXMXm_model_rcf_1"
				},
				{
					"detector_id": "zQ2xTXUBNp5PN4GLXMXm",
					"model_type": "threshold",
					"model_id": "zQ2xTXUBNp5PN4GLXMXm_model_threshold"
				},
				{
					"detector_id": "kw2nTXUBNp5PN4GLucVB",
					"model_type": "entity",
					"model_id": "kw2nTXUBNp5PN4GLucVB_entity_app_1"
				},
				{
					"detector_id": "kw2nTXUBNp5PN4GLucVB",
					"model_type": "entity",
					"model_id": "kw2nTXUBNp5PN4GLucVB_entity_app_3"
				},
				{
					"detector_id": "kw2nTXUBNp5PN4GLucVB",
					"model_type": "entity",
					"model_id": "kw2nTXUBNp5PN4GLucVB_entity_app_2"
				}
			]
		},
		"6y6coqbUSQWyDca3e1mGkw": {
			"ad_execute_request_count": 60,
			"models": [{
					"detector_id": "zQ2xTXUBNp5PN4GLXMXm",
					"model_type": "rcf",
					"model_id": "zQ2xTXUBNp5PN4GLXMXm_model_rcf_0"
				},
				{
					"detector_id": "kw2nTXUBNp5PN4GLucVB",
					"model_type": "entity",
					"model_id": "kw2nTXUBNp5PN4GLucVB_entity_app_4"
				},
				{
					"detector_id": "kw2nTXUBNp5PN4GLucVB",
					"model_type": "entity",
					"model_id": "kw2nTXUBNp5PN4GLucVB_entity_app_6"
				}
			],
			"ad_execute_failure_count": 1
		}
	}
}

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

This PR fixes two issues for the stats API.

First, we didn't propagate multi-entity detectors' models execution exceptions for the remote invocation.  This problem may impact stats' API ability to report the total failures count and thus hide an issue we should have reported during monitoring.  This PR fixes the issue by collecting model host nodes' exceptions from coordinating nodes.

Second, we didn't show active multi-entity detectors' models information on stats API.  This PR places this information into stats API output.

This PR also adds unit tests for ModelManager.

Testing done:
1. added unit tests
2. manually verified the two issues are resolved.
@kaituo kaituo added the bug Something isn't working label Oct 22, 2020
@codecov
Copy link

codecov bot commented Oct 22, 2020

Codecov Report

Merging #287 into master will increase coverage by 1.31%.
The diff coverage is 85.45%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #287      +/-   ##
============================================
+ Coverage     70.00%   71.32%   +1.31%     
- Complexity     1820     1852      +32     
============================================
  Files           190      190              
  Lines          8879     8903      +24     
  Branches        757      760       +3     
============================================
+ Hits           6216     6350     +134     
+ Misses         2307     2191     -116     
- Partials        356      362       +6     
Flag Coverage Δ Complexity Δ
#plugin 70.62% <85.45%> (+1.43%) 1852.00 <8.00> (+32.00)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...opendistroforelasticsearch/ad/ml/ModelManager.java 90.60% <0.00%> (+11.56%) 110.00 <0.00> (+12.00)
...rch/ad/transport/AnomalyResultTransportAction.java 80.33% <85.71%> (+13.41%) 68.00 <4.00> (+12.00)
...stroforelasticsearch/ad/AnomalyDetectorPlugin.java 93.38% <100.00%> (ø) 13.00 <0.00> (ø)
...distroforelasticsearch/ad/caching/CacheBuffer.java 82.26% <100.00%> (+0.12%) 38.00 <1.00> (+1.00)
...stroforelasticsearch/ad/caching/PriorityCache.java 89.49% <100.00%> (+0.14%) 59.00 <2.00> (+2.00)
...oforelasticsearch/ad/feature/SearchFeatureDao.java 80.87% <100.00%> (ø) 83.00 <0.00> (ø)
...earch/ad/stats/suppliers/ModelsOnNodeSupplier.java 100.00% <100.00%> (ø) 5.00 <1.00> (ø)
...sticsearch/ad/indices/AnomalyDetectionIndices.java 51.00% <0.00%> (+0.67%) 41.00% <0.00%> (+1.00%)
...asticsearch/ad/cluster/ADClusterEventListener.java 92.00% <0.00%> (+2.00%) 14.00% <0.00%> (ø%)
...n/opendistroforelasticsearch/ad/ml/ModelState.java 97.56% <0.00%> (+4.87%) 15.00% <0.00%> (+1.00%)
... and 4 more

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few minor comments. But, overall, it looks good. One question: does Readme need to be updated at all on KNN Stats API usage?

@jmazanec15
Copy link
Member

I have a few minor comments. But, overall, it looks good. One question: does Readme need to be updated at all on KNN Stats API usage?

Ah, I see API usage is not specified in the README. It looks like this documentation does not need to be updated.

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kaituo
Copy link
Member Author

kaituo commented Oct 22, 2020

I have a few minor comments. But, overall, it looks good. One question: does Readme need to be updated at all on KNN Stats API usage?

Ah, I see API usage is not specified in the README. It looks like this documentation does not need to be updated.

Yes, I will make sure our docs are up dated. We have one new API and few other changes.

@kaituo kaituo merged commit 2f18231 into opendistro-for-elasticsearch:master Oct 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants