Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes calculation of average thread blocked time and average thread waited time #118

Merged
merged 3 commits into from
Apr 8, 2022

Conversation

eirsep
Copy link
Member

@eirsep eirsep commented Jan 24, 2022

  • Enables thread contention monitoring.
  • Fixes calculation of average thread blocked time and average thread waited time when thread is blocked/waiting due to same event across thread dump intervals.

Signed-off-by: Surya Sashank Nistala snistala@amazon.com

Issues resolved

#92

Testing

for i in {1..3};do curl "http://localhost:9600/_plugins/_performanceanalyzer/metrics?metrics=Thread_Blocked_Time,Thread_Blocked_Event&agg=max,max&dim=Operation" ;echo;sleep 5;done
{"k1yr46alRW2mC5h7nCSSYQ": {"timestamp": 1643018630000, "data": {"fields":[{"name":"Operation","type":"VARCHAR"},{"name":"Thread_Blocked_Time","type":"DOUBLE"},{"name":"Thread_Blocked_Event","type":"DOUBLE"}],"records":[["GC",0.0,0.0],["flush",0.0,0.0],["generic",0.0,149.0],["management",0.0,38.0],["other",10.015115325794273,27.0],["refresh",0.0,0.0],["search",10.015115302620272,24.0],["transportWorker",0.0,87.0]]}}}

{"k1yr46alRW2mC5h7nCSSYQ": {"timestamp": 1643018635000, "data": {"fields":[{"name":"Operation","type":"VARCHAR"},{"name":"Thread_Blocked_Time","type":"DOUBLE"},{"name":"Thread_Blocked_Event","type":"DOUBLE"}],"records":[["GC",0.0,0.0],["flush",0.0,0.0],["generic",0.0,149.0],["management",0.0,38.0],["other",10.013015012003,27.0],["refresh",0.0,0.0],["search",10.013015012002999,24.0],["shardquery",3.112,1.0],["transportWorker",0.0,87.0]]}}}

{"k1yr46alRW2mC5h7nCSSYQ": {"timestamp": 1643018640000, "data": {"fields":[{"name":"Operation","type":"VARCHAR"},{"name":"Thread_Blocked_Time","type":"DOUBLE"},{"name":"Thread_Blocked_Event","type":"DOUBLE"}],"records":[["GC",0.0,0.0],["flush",0.0,0.0],["generic",0.0,149.0],["management",0.0,38.0],["other",10.013238956117647,27.0],["refresh",0.0,0.0],["search",10.01323896387822,24.0],["transportWorker",0.0,87.0]]}}}

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@eirsep eirsep requested a review from a team January 24, 2022 18:38
@sruti1312
Copy link
Contributor

Thanks for making this change.

  1. Integration tests are failing with AccessControlException error.
    From logs,
Caused by: java.security.AccessControlException: access denied ("java.lang.management.ManagementPermission" "control")

From java documentation, setThreadContentionMonitoringEnabled method can throw SecurityException - if a security manager exists and the caller does not have ManagementPermission("control").

  1. One other concern is the performance impact of enabling thread contention monitoring. Can we run a small performance test to get more insights about this?

@eirsep eirsep force-pushed the enable-thread-contention branch 2 times, most recently from 8f914a4 to be87f54 Compare March 23, 2022 10:03
@eirsep eirsep force-pushed the enable-thread-contention branch 2 times, most recently from 7883f30 to e4656c5 Compare April 4, 2022 13:20
@codecov-commenter
Copy link

codecov-commenter commented Apr 4, 2022

Codecov Report

Merging #118 (68b2579) into main (8b79f18) will decrease coverage by 0.06%.
The diff coverage is 36.92%.

❗ Current head 68b2579 differs from pull request most recent head 8f491ed. Consider uploading reports for the commit 8f491ed to get more accurate results

@@             Coverage Diff              @@
##               main     #118      +/-   ##
============================================
- Coverage     72.03%   71.97%   -0.07%     
- Complexity     2945     2955      +10     
============================================
  Files           376      376              
  Lines         18723    18853     +130     
  Branches       1432     1468      +36     
============================================
+ Hits          13488    13570      +82     
- Misses         4648     4697      +49     
+ Partials        587      586       -1     
Impacted Files Coverage Δ
...org/opensearch/performanceanalyzer/AppContext.java 93.75% <ø> (ø)
...ensearch/performanceanalyzer/CertificateUtils.java 84.74% <ø> (ø)
.../opensearch/performanceanalyzer/ClientServers.java 100.00% <ø> (ø)
...g/opensearch/performanceanalyzer/ConfigStatus.java 83.33% <ø> (ø)
...va/org/opensearch/performanceanalyzer/DBUtils.java 60.52% <ø> (ø)
...performanceanalyzer/OSMetricsGeneratorFactory.java 42.85% <ø> (ø)
...ch/performanceanalyzer/PerformanceAnalyzerApp.java 43.13% <0.00%> (-3.68%) ⬇️
...erformanceanalyzer/PerformanceAnalyzerThreads.java 100.00% <ø> (ø)
...formanceanalyzer/PerformanceAnalyzerWebServer.java 68.65% <ø> (-5.98%) ⬇️
...rformanceanalyzer/collections/TimeExpiringSet.java 80.00% <ø> (ø)
... and 373 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b79f18...8f491ed. Read the comment docs.

@@ -0,0 +1 @@
true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove these null/*/*.conf files? They are generated during test execution. Maybe adding null/* to .gitignore will help avoid these issues in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@@ -0,0 +1 @@
true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Comment on lines +145 to +126
if (threadBean.isThreadContentionMonitoringSupported()) {
threadBean.setThreadContentionMonitoringEnabled(threadContentionMonitoringEnabled);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check and set happens during every OSMetricsCollector#collectMetrics call which is very frequent. Do we want to set it once during initialization and when there is state change in threadContentionMonitoringEnabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is not very costly
If the flag is already enabled then this short-circuits out of the method without doing anything.

if (contentionMonitoringEnabled != enable) {
               if (enable) {
                   // if reeabled, reset contention time statistics
                   // for all threads
                   resetContentionTimes0(0);
               }

               // update the VM of the state change
               setThreadContentionMonitoringEnabled0(enable);

               contentionMonitoringEnabled = enable;
           }

Also, the OSMetricsCollector is initialized in the PA Plugin. We can't do a set once and update on state change sort of thing from there as ThreadList.getNativeTidMap is a static call and will need to know thread contention monitoring is enabled or not before every run.

@sruti1312
Copy link
Contributor

sruti1312 commented Apr 5, 2022

Flaky EventLogFileHandlerTests test has been fixed. Currently, Integration tests (spin up a docker cluster and run a bunch of tests) are failing due to missing metrics (related to OSMetricsCollector). @eirsep Can you check if the failure is related to your changes. Thanks!

Looks like Integration tests are failing in main as well.
https://github.com/opensearch-project/performance-analyzer-rca/runs/5842874823?check_suite_focus=true

@eirsep eirsep requested a review from sruti1312 April 7, 2022 17:08
@sruti1312
Copy link
Contributor

sruti1312
sruti1312 previously approved these changes Apr 7, 2022
@sruti1312 sruti1312 requested a review from sgup432 April 7, 2022 18:31
sgup432
sgup432 previously approved these changes Apr 8, 2022
…read blocked time and average thread waited time

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
@eirsep eirsep dismissed stale reviews from sgup432 and sruti1312 via 40cc6b0 April 8, 2022 19:59
@sgup432
Copy link
Contributor

sgup432 commented Apr 8, 2022

Approving. Integ test failing due to known failure which was fixed recently and seems like this PR didn't pull that latest commit.

@sgup432 sgup432 self-requested a review April 8, 2022 20:34
@sruti1312 sruti1312 merged commit d329ddb into opensearch-project:main Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants