Improve buffer management throughout the load/fetch and parse lifecycle #2186

jhy · 2024-08-02T00:57:35Z

The goal of BUFFMAN is to reduce memory consumption and reduce GC pressure within jsoup, through improved buffer management.

It picks up the thread from #1800 (by @chibenwa), which I got stalled on when reimplementing buffering in CharacterReader.

Specifically, the implementation will re-use char[] and byte[] arrays whenever possible, vs creating new ones on each read, which is the Java default pattern. And those buffers are able to be reaped by GC as the SoftPool implementation holds them in SoftReferences. That improves on the current implementation of this pattern in StringUtil.borrowBuilder(), which retains those for the lifecycle of the thread, which may be longer than the parser is required. The buffer sizes will also be lowered from the current defaults of 32K.

This initial draft includes recycling in CharacterReader and for StringBuilders. I intend to recycle the byte[] buffers used in DataUtil next (which is used when parsing from a connection or file). I was planning on just extending BufferedInputStream to replace the new byte[] with a recycled, but then remembered #2054 which effectively means we can't extend BIS after Java 21. So I think I'll just make a small non-synchronized impl of a BIS, similar to my impl in CharacterReader.

Removed the use of a backing BufferedInputReader, which was redundant and creating large char array buffers. Setting up so that we can recycle the charBuf.

A SoftPool is a ThreadLocal SofReference Stack of <T>, with borrow and release methods. Allows simple recycling of objects between uses, and for those to be reaped when inactive.

Replaces uses of BufferedInputStreams Advantages: - can recycle the byte[] buffer; so significant reduction in GC load - if consumer is reading into an array and there is no mark, no need to allocate a buffer - doesn't aim to support multi-threaded reads, so no syncs or locking Also, reduced the DefaultBufferSize to 8K from 32K

src/main/java/org/jsoup/internal/SimpleBufferedInput.java

Also, eliminates a redundant buffer by not creating a ByteArrayOutputStream. Because the ByteBuffer returned may have an array with capacity beyond its limit, uses of that bytebuffer now use the .limit() size, vs assuming it was the exact size.

While Buffer and ByteBuffer were available since Android 1, incl .flip(), scents was complaining about ControllableInputStream's use of .flip. Best I can see is that because .flip in previous JDKs returned Buffer, and now ByteBuffer, this caused Android Scents to misread the API and flag. We don't use the return of flip so it's not relevant either way.

When reading ahead to detect the charset, instead of buffering into a new ByteBuffer, reuse the ControllableInputStream.

src/main/java/org/jsoup/helper/DataUtil.java

src/main/java/org/jsoup/internal/ControllableInputStream.java

Also close underlying reader if passthrough read is done.

src/main/java/org/jsoup/helper/DataUtil.java

(Failing in CI on Windows JDK 21 only)

jhy · 2024-08-09T06:51:58Z

Now, let's review the results. Using JMH, I benchmarked jsoup 1.18.1 against this 1.18.2 snapshot. (I'll publish the repository for this so we can use it as a base for further benchmarking.)

All benchmark runs were executed with the following settings:

java -jar target/jsoup-bench.jar -r 10 -wi 2 -w 5 -t max -f 2 -i 2
# Detecting actual CPU count: 8 detected
# JMH version: 1.37
# VM version: JDK 17.0.11, OpenJDK 64-Bit Server VM, 17.0.11+0
# VM invoker: /opt/homebrew/Cellar/openjdk@17/17.0.11/libexec/openjdk.jdk/Contents/Home/bin/java
# VM options: <none>
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 2 iterations, 5 s each
# Measurement: 2 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 8 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.jsoup.bench.ParseFromInputStream.parseLargeInput

Benchmark	Size, bytes	Operations per second			Allocations, bytes per op
		Old	New	Delta	Old	New	Delta
parseLargeInput	508,177	1,639	1,618	-1.28%	2,383,758	2,243,602	-5.88%
parseMediumInput	8,818	43,601	48,739	11.78%	259,880	74,152	-71.47%
parseSmallInput	3,396	77,981	93,539	19.95%	218,304	36,064	-83.48%
parseTinyInput	274	123,667	146,964	18.84%	199,672	22,096	-88.93%

parseLargeString	508,177	1,817	1,771	-2.53%	2,247,786	2,169,273	-3.49%
parseMediumString	8,818	78,849	82,480	4.61%	125,512	52,632	-58.07%
parseSmallString	3,396	228,934	270,684	18.24%	91,176	20,824	-77.16%
parseTinyString	274	558,929	1,355,892	142.59%	78,744	10,024	-87.27%

Analysis:

The "Input" benchmarks involve reading from an InputStream (specifically, a JAR resource stream) and include the full file read within a try/catch block per operation. These results can also serve as a proxy for parsing from a file or an HTTP resource.

The "String" benchmarks simply parse from an existing string. As expected, they are significantly faster because they don't include the overhead of reading from a file or input stream. However, this doesn’t necessarily mean that reading an input stream into a string and then passing that to jsoup is faster in practice compared to directly supplying the input stream.

Note that the file sizes selected are intended to be representative of common HTML processing chunks, but they are arbitrarily chosen and the sizes are not linear.

Summary:

Throughput Improvements: Parse times for all but the largest files show significant throughput improvements for both String and InputStream benchmarks.
Tiny Strings: relatively smaller strings in particular, show a large improvement due to the relative impact of recycling their CharacterReader buffers.
Memory Allocations: (measured in bytes per operation) are significantly improved across the board. Through lower heap requirements and lower garbage-collection times, this should lead to higher concurrency and lower overhead, particularly in resource-constrained environments such as Android or high-volume servers.

(Note that scales are different in each chart)

Overall, this is a strong result from the changes in this branch, and I'll be merging it in. Of course, there are opportunities for further improvement. I'm keen to see what others can find as well!

jhy · 2024-08-09T08:51:52Z

Here's a view with the operations and allocations relative to the input file size:

We can see that overall, the larger the input size, the more efficient the parse, and there's no elbow as sizes get larger.

jhy · 2024-08-09T23:52:11Z

The profiling logs are here: https://gist.github.com/jhy/f6e9c8ca8fd506c6c36f7d1a3154631c

jhy added 3 commits August 1, 2024 12:07

Refactored bufferUp in CharacterReader

ab4ebfc

Removed the use of a backing BufferedInputReader, which was redundant and creating large char array buffers. Setting up so that we can recycle the charBuf.

Introduces SoftPool

31f3e30

A SoftPool is a ThreadLocal SofReference Stack of <T>, with borrow and release methods. Allows simple recycling of objects between uses, and for those to be reaped when inactive.

Maintain the stringCache between runs

9e56e93

jhy added the improvement label Aug 2, 2024

jhy added this to the 1.18.2 milestone Aug 2, 2024

jhy mentioned this pull request Aug 2, 2024

ConstrainableInputStream pins virtual threads (Java-21) #2054

Closed

github-advanced-security bot found potential problems Aug 6, 2024

View reviewed changes

src/main/java/org/jsoup/internal/SimpleBufferedInput.java Dismissed Show resolved Hide resolved

src/main/java/org/jsoup/internal/SimpleBufferedInput.java Dismissed Show resolved Hide resolved

jhy added 4 commits August 7, 2024 12:07

Tidy up

5071f6c

In DataUtil, reuse the input stream when detecting charset

848cfe0

When reading ahead to detect the charset, instead of buffering into a new ByteBuffer, reuse the ControllableInputStream.

github-advanced-security bot found potential problems Aug 8, 2024

View reviewed changes

src/main/java/org/jsoup/helper/DataUtil.java Fixed Show fixed Hide fixed

src/main/java/org/jsoup/internal/ControllableInputStream.java Dismissed Show dismissed Hide dismissed

Simplify BOM character skip

92104bf

Also close underlying reader if passthrough read is done.

github-advanced-security bot found potential problems Aug 8, 2024

View reviewed changes

src/main/java/org/jsoup/helper/DataUtil.java Dismissed Show dismissed Hide dismissed

src/main/java/org/jsoup/helper/DataUtil.java Dismissed Show dismissed Hide dismissed

Debug progress emission count

6806255

(Failing in CI on Windows JDK 21 only)

jhy marked this pull request as ready for review August 9, 2024 23:40

jhy merged commit ec2d0c4 into master Aug 10, 2024
22 checks passed

jhy deleted the buffman branch August 10, 2024 00:00

jhy mentioned this pull request Aug 10, 2024

Implement buffer recycling for CharacterReader #1800

Closed

jhy mentioned this pull request Aug 27, 2024

CharacterReader always allocate a 32 KB buffer that can even exceed document size #1773

Closed

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve buffer management throughout the load/fetch and parse lifecycle #2186

Improve buffer management throughout the load/fetch and parse lifecycle #2186

jhy commented Aug 2, 2024

jhy commented Aug 9, 2024

jhy commented Aug 9, 2024

jhy commented Aug 9, 2024

Improve buffer management throughout the load/fetch and parse lifecycle #2186

Improve buffer management throughout the load/fetch and parse lifecycle #2186

Conversation

jhy commented Aug 2, 2024

jhy commented Aug 9, 2024

Analysis:

Summary:

jhy commented Aug 9, 2024

jhy commented Aug 9, 2024