[Bug]: Understand and mitigate the source of Index latency increase in 2.0-RC #2054

CEHENKLE · 2022-04-29T01:24:58Z

Describe the bug

While running performance tests for 2.0-RC1, we saw an indexing latency appear to increase by ~13%. This issue is to investigate the source of the increase so we can propose ways to mitigate it.

#1624 (comment)

To reproduce

#1624 (comment)

Expected behavior

No response

Screenshots

If applicable, add screenshots to help explain your problem.

Host / Environment

No response

Additional context

No response

Relevant log output

No response

bbarani · 2022-05-06T16:10:47Z

@CEHENKLE @dblock @kotwanikunal Whats the current status of this issue? Did we find the root cause of the performance issue?

kotwanikunal · 2022-05-06T16:52:41Z

CC'ing @andrross as he was actively looking into it.

andrross · 2022-05-06T17:35:39Z

I am actively working on it. The status is that the latency and throughput difference between 1.3 and 2.0 has disappeared with runs against recent builds. However, it appears that the cause is that 1.3 has gotten slower.

I'm still working on this, but my theory is that a change in Java versions and/or the default garbage collector is introducing the differences here.

dblock · 2022-05-06T21:05:25Z

@andrross The G1GC fix was backported into 1.3 and should be part of 1.3.2, opensearch-project/OpenSearch#2971

andrross · 2022-05-06T21:28:38Z

@dblock Yes, we see the impact of G1GC clearly in the "old" GC metrics:

However, the indexing latency and throughput metrics also get roughly 20% worse at the same time (for this nyc_taxis workload). What I've found so far is that 1.3 with the concurrent mark sweep collector was about 10% better in indexing latency and throughput than 2.0 for the nyc_taxis workload, which motivated the creation of this issue. However, when both 1.3 and 2.0 are using the G1GC collector, then 2.0 performs better across the board.

Obviously the real world performance of 1.3 with concurrent mark sweep was quite bad so our performance tests seem not to be measuring the right thing. The data right now is showing us that 2.0 is as good or better than 1.3.2, but we still don't have an answer for why we didn't catch the performance issue in 1.3 that our users saw, and this issue adds further nuance that the nyc_taxi workload actually performs slightly better with the CMS garbage collector.

andrross · 2022-05-06T23:41:18Z

I have run the stack overflow dataset workload on my test host (c6i.8xlarge). The stack overflow dataset is the recommended workload for testing indexing performance.

Baseline=1.3.2(cms) Contender=2.0.0

Metric	Task	Baseline	Contender	Diff	Unit
Cumulative indexing time of primary shards		162.104	106.691	-55.4129	min
Min cumulative indexing time across primary shard		32.2656	21.0059	-11.2597	min
Median cumulative indexing time across primary shard		32.3427	21.3245	-11.0182	min
Max cumulative indexing time across primary shard		32.6654	21.7234	-10.942	min
Cumulative indexing throttle time of primary shards		0	0	0	min
Min cumulative indexing throttle time across primary shard		0	0	0	min
Median cumulative indexing throttle time across primary shard		0	0	0	min
Max cumulative indexing throttle time across primary shard		0	0	0	min
Cumulative merge time of primary shards		88.8526	63.596	-25.2566	min
Cumulative merge count of primary shards		174	138	-36
Min cumulative merge time across primary shard		15.7588	9.96753	-5.79125	min
Median cumulative merge time across primary shard		17.2805	12.0434	-5.23703	min
Max cumulative merge time across primary shard		19.5352	16.854	-2.6812	min
Cumulative merge throttle time of primary shards		26.424	24.4542	-1.96987	min
Min cumulative merge throttle time across primary shard		3.93945	3.47808	-0.46137	min
Median cumulative merge throttle time across primary shard		5.42332	4.9479	-0.47542	min
Max cumulative merge throttle time across primary shard		7.1664	6.34802	-0.81838	min
Cumulative refresh time of primary shards		4.2662	7.75975	3.49355	min
Cumulative refresh count of primary shards		181	159	-22
Min cumulative refresh time across primary shard		0.809783	1.36788	0.5581	min
Median cumulative refresh time across primary shard		0.851817	1.58917	0.73735	min
Max cumulative refresh time across primary shard		0.91815	1.64808	0.72993	min
Cumulative flush time of primary shards		12.7964	14.9029	2.10652	min
Cumulative flush count of primary shards		75	76	1
Min cumulative flush time across primary shard		2.35905	2.61338	0.25433	min
Median cumulative flush time across primary shard		2.52937	2.89508	0.36572	min
Max cumulative flush time across primary shard		2.76248	3.43557	0.67308	min
Total Young Gen GC time		10.036	5.58	-4.456	s
Total Young Gen GC count		915	474	-441
Total Old Gen GC time		0.393	0	-0.393	s
Total Old Gen GC count		16	0	-16
Store size		51.2623	39.7393	-11.523	GB
Translog size		2.56114e-07	2.56114e-07	0	GB
Heap used for segments		0.89381	0	-0.89381	MB
Heap used for doc values		0.0696449	0	-0.06964	MB
Heap used for terms		0.608337	0	-0.60834	MB
Heap used for norms		0.0374146	0	-0.03741	MB
Heap used for points		0	0	0	MB
Heap used for stored fields		0.178413	0	-0.17841	MB
Segment count		307	265	-42
Min Throughput	index-append	53566.3	62798.3	9232.02	docs/s
Mean Throughput	index-append	60965.4	73708.9	12743.5	docs/s
Median Throughput	index-append	59709.4	72577.6	12868.2	docs/s
Max Throughput	index-append	70211	86479.1	16268	docs/s
50th percentile latency	index-append	337.145	224.911	-112.234	ms
90th percentile latency	index-append	1267.01	892.285	-374.725	ms
99th percentile latency	index-append	2758.81	4656.96	1898.15	ms
99.9th percentile latency	index-append	4146.45	8025.11	3878.66	ms
100th percentile latency	index-append	5103.43	12595.4	7491.94	ms
50th percentile service time	index-append	337.145	224.911	-112.234	ms
90th percentile service time	index-append	1267.01	892.285	-374.725	ms
99th percentile service time	index-append	2758.81	4656.96	1898.15	ms
99.9th percentile service time	index-append	4146.45	8025.11	3878.66	ms
100th percentile service time	index-append	5103.43	12595.4	7491.94	ms
error rate	index-append	0	0	0	%

Baseline=1.3.2(g1gc) Contender=2.0.0

Metric	Task	Baseline	Contender	Diff	Unit
Cumulative indexing time of primary shards		128.098	106.691	-21.407	min
Min cumulative indexing time across primary shard		25.3173	21.0059	-4.31143	min
Median cumulative indexing time across primary shard		25.6093	21.3245	-4.28485	min
Max cumulative indexing time across primary shard		25.9232	21.7234	-4.19985	min
Cumulative indexing throttle time of primary shards		0	0	0	min
Min cumulative indexing throttle time across primary shard		0	0	0	min
Median cumulative indexing throttle time across primary shard		0	0	0	min
Max cumulative indexing throttle time across primary shard		0	0	0	min
Cumulative merge time of primary shards		97.9316	63.596	-34.3357	min
Cumulative merge count of primary shards		157	138	-19
Min cumulative merge time across primary shard		15.6794	9.96753	-5.71183	min
Median cumulative merge time across primary shard		19.4528	12.0434	-7.40937	min
Max cumulative merge time across primary shard		23.2481	16.854	-6.39408	min
Cumulative merge throttle time of primary shards		30.7512	24.4542	-6.29702	min
Min cumulative merge throttle time across primary shard		3.62893	3.47808	-0.15085	min
Median cumulative merge throttle time across primary shard		5.56523	4.9479	-0.61733	min
Max cumulative merge throttle time across primary shard		9.05382	6.34802	-2.7058	min
Cumulative refresh time of primary shards		5.02953	7.75975	2.73022	min
Cumulative refresh count of primary shards		181	159	-22
Min cumulative refresh time across primary shard		0.914183	1.36788	0.4537	min
Median cumulative refresh time across primary shard		1.0008	1.58917	0.58837	min
Max cumulative refresh time across primary shard		1.11332	1.64808	0.53477	min
Cumulative flush time of primary shards		12.353	14.9029	2.54995	min
Cumulative flush count of primary shards		80	76	-4
Min cumulative flush time across primary shard		2.2085	2.61338	0.40488	min
Median cumulative flush time across primary shard		2.30552	2.89508	0.58957	min
Max cumulative flush time across primary shard		2.8126	3.43557	0.62297	min
Total Young Gen GC time		6.137	5.58	-0.557	s
Total Young Gen GC count		518	474	-44
Total Old Gen GC time		0	0	0	s
Total Old Gen GC count		0	0	0
Store size		36.0994	39.7393	3.63989	GB
Translog size		2.56114e-07	2.56114e-07	0	GB
Heap used for segments		0.448589	0	-0.44859	MB
Heap used for doc values		0.0371895	0	-0.03719	MB
Heap used for terms		0.287628	0	-0.28763	MB
Heap used for norms		0.0177002	0	-0.0177	MB
Heap used for points		0	0	0	MB
Heap used for stored fields		0.106071	0	-0.10607	MB
Segment count		145	265	120
Min Throughput	index-append	53857.6	62798.3	8940.71	docs/s
Mean Throughput	index-append	64144.4	73708.9	9564.46	docs/s
Median Throughput	index-append	64871.7	72577.6	7705.95	docs/s
Max Throughput	index-append	75949.1	86479.1	10529.9	docs/s
50th percentile latency	index-append	390.425	224.911	-165.513	ms
90th percentile latency	index-append	1202.99	892.285	-310.707	ms
99th percentile latency	index-append	3176.48	4656.96	1480.49	ms
99.9th percentile latency	index-append	4540.35	8025.11	3484.76	ms
100th percentile latency	index-append	5444	12595.4	7151.37	ms
50th percentile service time	index-append	390.425	224.911	-165.513	ms
90th percentile service time	index-append	1202.99	892.285	-310.707	ms
99th percentile service time	index-append	3176.48	4656.96	1480.49	ms
99.9th percentile service time	index-append	4540.35	8025.11	3484.76	ms
100th percentile service time	index-append	5444	12595.4	7151.37	ms
error rate	index-append	0	0	0	%

Summary Metrics

Metric	1.3.2 (cms)	1.3.2 (g1gc)	2.0.0	Unit
Mean Throughput	60965.4	64144.4	73708.9	doc/s
p50 Index Latency	337.14	390.425	224.911	ms
p99 Index Latency	2758.81	3176.48	4656.96	ms
p99.9 Index Latency	4146.45	4540.35	8025.11	ms

The upshot here is that 2.0.0 offers the best indexing throughput performance and p50 latency. At higher percentiles, then 1.3.2 does better, particularly using the old CMS garbage collector.

bbarani · 2022-06-06T16:40:25Z

@CEHENKLE I assume we can close this issue as we found the root cause of the latency? Can you please confirm?

bbarani · 2022-07-12T18:09:29Z

Closing this issue as we finalized the root cause of the latency.

CEHENKLE added bug Something isn't working untriaged Issues that have not yet been triaged labels Apr 29, 2022

dblock added the v2.0.0 label Apr 29, 2022

CEHENKLE assigned kartg Apr 29, 2022

CEHENKLE removed the untriaged Issues that have not yet been triaged label Apr 29, 2022

CEHENKLE unassigned kartg May 3, 2022

CEHENKLE assigned andrross May 6, 2022

CEHENKLE mentioned this issue May 10, 2022

[Retrospective] Release version 1.3.2 #2093

Closed

bbarani closed this as completed Jul 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Understand and mitigate the source of Index latency increase in 2.0-RC #2054

[Bug]: Understand and mitigate the source of Index latency increase in 2.0-RC #2054

CEHENKLE commented Apr 29, 2022

bbarani commented May 6, 2022

kotwanikunal commented May 6, 2022

andrross commented May 6, 2022

dblock commented May 6, 2022

andrross commented May 6, 2022 •

edited

Loading

andrross commented May 6, 2022

bbarani commented Jun 6, 2022

bbarani commented Jul 12, 2022

[Bug]: Understand and mitigate the source of Index latency increase in 2.0-RC #2054

[Bug]: Understand and mitigate the source of Index latency increase in 2.0-RC #2054

Comments

CEHENKLE commented Apr 29, 2022

Describe the bug

To reproduce

Expected behavior

Screenshots

Host / Environment

Additional context

Relevant log output

bbarani commented May 6, 2022

kotwanikunal commented May 6, 2022

andrross commented May 6, 2022

dblock commented May 6, 2022

andrross commented May 6, 2022 • edited Loading

andrross commented May 6, 2022

Baseline=1.3.2(cms) Contender=2.0.0

Baseline=1.3.2(g1gc) Contender=2.0.0

Summary Metrics

bbarani commented Jun 6, 2022

bbarani commented Jul 12, 2022

andrross commented May 6, 2022 •

edited

Loading