Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Understand and mitigate the source of Index latency increase in 2.0-RC #2054

Closed
CEHENKLE opened this issue Apr 29, 2022 · 8 comments
Closed
Assignees
Labels
bug Something isn't working v2.0.0

Comments

@CEHENKLE
Copy link
Member

Describe the bug

While running performance tests for 2.0-RC1, we saw an indexing latency appear to increase by ~13%. This issue is to investigate the source of the increase so we can propose ways to mitigate it.

#1624 (comment)

To reproduce

#1624 (comment)

Expected behavior

No response

Screenshots

If applicable, add screenshots to help explain your problem.

Host / Environment

No response

Additional context

No response

Relevant log output

No response

@CEHENKLE CEHENKLE added bug Something isn't working untriaged Issues that have not yet been triaged labels Apr 29, 2022
@dblock dblock added the v2.0.0 label Apr 29, 2022
@CEHENKLE CEHENKLE removed the untriaged Issues that have not yet been triaged label Apr 29, 2022
@bbarani
Copy link
Member

bbarani commented May 6, 2022

@CEHENKLE @dblock @kotwanikunal Whats the current status of this issue? Did we find the root cause of the performance issue?

@kotwanikunal
Copy link
Member

CC'ing @andrross as he was actively looking into it.

@andrross
Copy link
Member

andrross commented May 6, 2022

I am actively working on it. The status is that the latency and throughput difference between 1.3 and 2.0 has disappeared with runs against recent builds. However, it appears that the cause is that 1.3 has gotten slower.

Screen Shot 2022-05-06 at 10 30 02 AM

Screen Shot 2022-05-06 at 10 30 16 AM

I'm still working on this, but my theory is that a change in Java versions and/or the default garbage collector is introducing the differences here.

@dblock
Copy link
Member

dblock commented May 6, 2022

@andrross The G1GC fix was backported into 1.3 and should be part of 1.3.2, opensearch-project/OpenSearch#2971

@andrross
Copy link
Member

andrross commented May 6, 2022

@dblock Yes, we see the impact of G1GC clearly in the "old" GC metrics:

Screen Shot 2022-05-06 at 2 12 06 PM

However, the indexing latency and throughput metrics also get roughly 20% worse at the same time (for this nyc_taxis workload). What I've found so far is that 1.3 with the concurrent mark sweep collector was about 10% better in indexing latency and throughput than 2.0 for the nyc_taxis workload, which motivated the creation of this issue. However, when both 1.3 and 2.0 are using the G1GC collector, then 2.0 performs better across the board.

Obviously the real world performance of 1.3 with concurrent mark sweep was quite bad so our performance tests seem not to be measuring the right thing. The data right now is showing us that 2.0 is as good or better than 1.3.2, but we still don't have an answer for why we didn't catch the performance issue in 1.3 that our users saw, and this issue adds further nuance that the nyc_taxi workload actually performs slightly better with the CMS garbage collector.

@andrross
Copy link
Member

andrross commented May 6, 2022

I have run the stack overflow dataset workload on my test host (c6i.8xlarge). The stack overflow dataset is the recommended workload for testing indexing performance.

Baseline=1.3.2(cms) Contender=2.0.0

Metric Task Baseline Contender Diff Unit
Cumulative indexing time of primary shards 162.104 106.691 -55.4129 min
Min cumulative indexing time across primary shard 32.2656 21.0059 -11.2597 min
Median cumulative indexing time across primary shard 32.3427 21.3245 -11.0182 min
Max cumulative indexing time across primary shard 32.6654 21.7234 -10.942 min
Cumulative indexing throttle time of primary shards 0 0 0 min
Min cumulative indexing throttle time across primary shard 0 0 0 min
Median cumulative indexing throttle time across primary shard 0 0 0 min
Max cumulative indexing throttle time across primary shard 0 0 0 min
Cumulative merge time of primary shards 88.8526 63.596 -25.2566 min
Cumulative merge count of primary shards 174 138 -36
Min cumulative merge time across primary shard 15.7588 9.96753 -5.79125 min
Median cumulative merge time across primary shard 17.2805 12.0434 -5.23703 min
Max cumulative merge time across primary shard 19.5352 16.854 -2.6812 min
Cumulative merge throttle time of primary shards 26.424 24.4542 -1.96987 min
Min cumulative merge throttle time across primary shard 3.93945 3.47808 -0.46137 min
Median cumulative merge throttle time across primary shard 5.42332 4.9479 -0.47542 min
Max cumulative merge throttle time across primary shard 7.1664 6.34802 -0.81838 min
Cumulative refresh time of primary shards 4.2662 7.75975 3.49355 min
Cumulative refresh count of primary shards 181 159 -22
Min cumulative refresh time across primary shard 0.809783 1.36788 0.5581 min
Median cumulative refresh time across primary shard 0.851817 1.58917 0.73735 min
Max cumulative refresh time across primary shard 0.91815 1.64808 0.72993 min
Cumulative flush time of primary shards 12.7964 14.9029 2.10652 min
Cumulative flush count of primary shards 75 76 1
Min cumulative flush time across primary shard 2.35905 2.61338 0.25433 min
Median cumulative flush time across primary shard 2.52937 2.89508 0.36572 min
Max cumulative flush time across primary shard 2.76248 3.43557 0.67308 min
Total Young Gen GC time 10.036 5.58 -4.456 s
Total Young Gen GC count 915 474 -441
Total Old Gen GC time 0.393 0 -0.393 s
Total Old Gen GC count 16 0 -16
Store size 51.2623 39.7393 -11.523 GB
Translog size 2.56114e-07 2.56114e-07 0 GB
Heap used for segments 0.89381 0 -0.89381 MB
Heap used for doc values 0.0696449 0 -0.06964 MB
Heap used for terms 0.608337 0 -0.60834 MB
Heap used for norms 0.0374146 0 -0.03741 MB
Heap used for points 0 0 0 MB
Heap used for stored fields 0.178413 0 -0.17841 MB
Segment count 307 265 -42
Min Throughput index-append 53566.3 62798.3 9232.02 docs/s
Mean Throughput index-append 60965.4 73708.9 12743.5 docs/s
Median Throughput index-append 59709.4 72577.6 12868.2 docs/s
Max Throughput index-append 70211 86479.1 16268 docs/s
50th percentile latency index-append 337.145 224.911 -112.234 ms
90th percentile latency index-append 1267.01 892.285 -374.725 ms
99th percentile latency index-append 2758.81 4656.96 1898.15 ms
99.9th percentile latency index-append 4146.45 8025.11 3878.66 ms
100th percentile latency index-append 5103.43 12595.4 7491.94 ms
50th percentile service time index-append 337.145 224.911 -112.234 ms
90th percentile service time index-append 1267.01 892.285 -374.725 ms
99th percentile service time index-append 2758.81 4656.96 1898.15 ms
99.9th percentile service time index-append 4146.45 8025.11 3878.66 ms
100th percentile service time index-append 5103.43 12595.4 7491.94 ms
error rate index-append 0 0 0 %

Baseline=1.3.2(g1gc) Contender=2.0.0

Metric Task Baseline Contender Diff Unit
Cumulative indexing time of primary shards 128.098 106.691 -21.407 min
Min cumulative indexing time across primary shard 25.3173 21.0059 -4.31143 min
Median cumulative indexing time across primary shard 25.6093 21.3245 -4.28485 min
Max cumulative indexing time across primary shard 25.9232 21.7234 -4.19985 min
Cumulative indexing throttle time of primary shards 0 0 0 min
Min cumulative indexing throttle time across primary shard 0 0 0 min
Median cumulative indexing throttle time across primary shard 0 0 0 min
Max cumulative indexing throttle time across primary shard 0 0 0 min
Cumulative merge time of primary shards 97.9316 63.596 -34.3357 min
Cumulative merge count of primary shards 157 138 -19
Min cumulative merge time across primary shard 15.6794 9.96753 -5.71183 min
Median cumulative merge time across primary shard 19.4528 12.0434 -7.40937 min
Max cumulative merge time across primary shard 23.2481 16.854 -6.39408 min
Cumulative merge throttle time of primary shards 30.7512 24.4542 -6.29702 min
Min cumulative merge throttle time across primary shard 3.62893 3.47808 -0.15085 min
Median cumulative merge throttle time across primary shard 5.56523 4.9479 -0.61733 min
Max cumulative merge throttle time across primary shard 9.05382 6.34802 -2.7058 min
Cumulative refresh time of primary shards 5.02953 7.75975 2.73022 min
Cumulative refresh count of primary shards 181 159 -22
Min cumulative refresh time across primary shard 0.914183 1.36788 0.4537 min
Median cumulative refresh time across primary shard 1.0008 1.58917 0.58837 min
Max cumulative refresh time across primary shard 1.11332 1.64808 0.53477 min
Cumulative flush time of primary shards 12.353 14.9029 2.54995 min
Cumulative flush count of primary shards 80 76 -4
Min cumulative flush time across primary shard 2.2085 2.61338 0.40488 min
Median cumulative flush time across primary shard 2.30552 2.89508 0.58957 min
Max cumulative flush time across primary shard 2.8126 3.43557 0.62297 min
Total Young Gen GC time 6.137 5.58 -0.557 s
Total Young Gen GC count 518 474 -44
Total Old Gen GC time 0 0 0 s
Total Old Gen GC count 0 0 0
Store size 36.0994 39.7393 3.63989 GB
Translog size 2.56114e-07 2.56114e-07 0 GB
Heap used for segments 0.448589 0 -0.44859 MB
Heap used for doc values 0.0371895 0 -0.03719 MB
Heap used for terms 0.287628 0 -0.28763 MB
Heap used for norms 0.0177002 0 -0.0177 MB
Heap used for points 0 0 0 MB
Heap used for stored fields 0.106071 0 -0.10607 MB
Segment count 145 265 120
Min Throughput index-append 53857.6 62798.3 8940.71 docs/s
Mean Throughput index-append 64144.4 73708.9 9564.46 docs/s
Median Throughput index-append 64871.7 72577.6 7705.95 docs/s
Max Throughput index-append 75949.1 86479.1 10529.9 docs/s
50th percentile latency index-append 390.425 224.911 -165.513 ms
90th percentile latency index-append 1202.99 892.285 -310.707 ms
99th percentile latency index-append 3176.48 4656.96 1480.49 ms
99.9th percentile latency index-append 4540.35 8025.11 3484.76 ms
100th percentile latency index-append 5444 12595.4 7151.37 ms
50th percentile service time index-append 390.425 224.911 -165.513 ms
90th percentile service time index-append 1202.99 892.285 -310.707 ms
99th percentile service time index-append 3176.48 4656.96 1480.49 ms
99.9th percentile service time index-append 4540.35 8025.11 3484.76 ms
100th percentile service time index-append 5444 12595.4 7151.37 ms
error rate index-append 0 0 0 %

Summary Metrics

Metric 1.3.2 (cms) 1.3.2 (g1gc) 2.0.0 Unit
Mean Throughput 60965.4 64144.4 73708.9 doc/s
p50 Index Latency 337.14 390.425 224.911 ms
p99 Index Latency 2758.81 3176.48 4656.96 ms
p99.9 Index Latency 4146.45 4540.35 8025.11 ms

The upshot here is that 2.0.0 offers the best indexing throughput performance and p50 latency. At higher percentiles, then 1.3.2 does better, particularly using the old CMS garbage collector.

@bbarani
Copy link
Member

bbarani commented Jun 6, 2022

@CEHENKLE I assume we can close this issue as we found the root cause of the latency? Can you please confirm?

@bbarani
Copy link
Member

bbarani commented Jul 12, 2022

Closing this issue as we finalized the root cause of the latency.

@bbarani bbarani closed this as completed Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working v2.0.0
Projects
None yet
Development

No branches or pull requests

6 participants