Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve buffer management throughout the load/fetch and parse lifecycle #2186

Merged
merged 10 commits into from
Aug 10, 2024

Conversation

jhy
Copy link
Owner

@jhy jhy commented Aug 2, 2024

The goal of BUFFMAN is to reduce memory consumption and reduce GC pressure within jsoup, through improved buffer management.

It picks up the thread from #1800 (by @chibenwa), which I got stalled on when reimplementing buffering in CharacterReader.

Specifically, the implementation will re-use char[] and byte[] arrays whenever possible, vs creating new ones on each read, which is the Java default pattern. And those buffers are able to be reaped by GC as the SoftPool implementation holds them in SoftReferences. That improves on the current implementation of this pattern in StringUtil.borrowBuilder(), which retains those for the lifecycle of the thread, which may be longer than the parser is required. The buffer sizes will also be lowered from the current defaults of 32K.

This initial draft includes recycling in CharacterReader and for StringBuilders. I intend to recycle the byte[] buffers used in DataUtil next (which is used when parsing from a connection or file). I was planning on just extending BufferedInputStream to replace the new byte[] with a recycled, but then remembered #2054 which effectively means we can't extend BIS after Java 21. So I think I'll just make a small non-synchronized impl of a BIS, similar to my impl in CharacterReader.

jhy added 3 commits August 1, 2024 12:07
Removed the use of a backing BufferedInputReader, which was redundant and creating large char array buffers.

Setting up so that we can recycle the charBuf.
A SoftPool is a ThreadLocal SofReference Stack of <T>, with borrow and release methods. Allows simple recycling of objects between uses, and for those to be reaped when inactive.
Replaces uses of BufferedInputStreams

Advantages:
- can recycle the byte[] buffer; so significant reduction in GC load
- if consumer is reading into an array and there is no mark, no need to allocate a buffer
- doesn't aim to support multi-threaded reads, so no syncs or locking

Also, reduced the DefaultBufferSize to 8K from 32K
jhy added 4 commits August 7, 2024 12:07
Also, eliminates a redundant buffer by not creating a ByteArrayOutputStream.

Because the ByteBuffer returned may have an array with capacity beyond its limit, uses of that bytebuffer now use the .limit() size, vs assuming it was the exact size.
While Buffer and ByteBuffer were available since Android 1, incl .flip(), scents was complaining about ControllableInputStream's use of .flip. Best I can see is that because .flip in previous JDKs returned Buffer, and now ByteBuffer, this caused Android Scents to misread the API and flag.

We don't use the return of flip so it's not relevant either way.
When reading ahead to detect the charset, instead of buffering into a new ByteBuffer, reuse the ControllableInputStream.
Also close underlying reader if passthrough read is done.
src/main/java/org/jsoup/helper/DataUtil.java Dismissed Show dismissed Hide dismissed
src/main/java/org/jsoup/helper/DataUtil.java Dismissed Show dismissed Hide dismissed
(Failing in CI on Windows JDK 21 only)
@jhy
Copy link
Owner Author

jhy commented Aug 9, 2024

Now, let's review the results. Using JMH, I benchmarked jsoup 1.18.1 against this 1.18.2 snapshot. (I'll publish the repository for this so we can use it as a base for further benchmarking.)

All benchmark runs were executed with the following settings:

java -jar target/jsoup-bench.jar -r 10 -wi 2 -w 5 -t max -f 2 -i 2
# Detecting actual CPU count: 8 detected
# JMH version: 1.37
# VM version: JDK 17.0.11, OpenJDK 64-Bit Server VM, 17.0.11+0
# VM invoker: /opt/homebrew/Cellar/openjdk@17/17.0.11/libexec/openjdk.jdk/Contents/Home/bin/java
# VM options: <none>
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 2 iterations, 5 s each
# Measurement: 2 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 8 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.jsoup.bench.ParseFromInputStream.parseLargeInput
Benchmark Size, bytes Operations per second     Allocations, bytes per op    
    Old New Delta Old New Delta
parseLargeInput 508,177 1,639 1,618 -1.28% 2,383,758 2,243,602 -5.88%
parseMediumInput 8,818 43,601 48,739 11.78% 259,880 74,152 -71.47%
parseSmallInput 3,396 77,981 93,539 19.95% 218,304 36,064 -83.48%
parseTinyInput 274 123,667 146,964 18.84% 199,672 22,096 -88.93%
               
parseLargeString 508,177 1,817 1,771 -2.53% 2,247,786 2,169,273 -3.49%
parseMediumString 8,818 78,849 82,480 4.61% 125,512 52,632 -58.07%
parseSmallString 3,396 228,934 270,684 18.24% 91,176 20,824 -77.16%
parseTinyString 274 558,929 1,355,892 142.59% 78,744 10,024 -87.27%

Analysis:

The "Input" benchmarks involve reading from an InputStream (specifically, a JAR resource stream) and include the full file read within a try/catch block per operation. These results can also serve as a proxy for parsing from a file or an HTTP resource.

The "String" benchmarks simply parse from an existing string. As expected, they are significantly faster because they don't include the overhead of reading from a file or input stream. However, this doesn’t necessarily mean that reading an input stream into a string and then passing that to jsoup is faster in practice compared to directly supplying the input stream.

Note that the file sizes selected are intended to be representative of common HTML processing chunks, but they are arbitrarily chosen and the sizes are not linear.

Summary:

  • Throughput Improvements: Parse times for all but the largest files show significant throughput improvements for both String and InputStream benchmarks.

  • Tiny Strings: relatively smaller strings in particular, show a large improvement due to the relative impact of recycling their CharacterReader buffers.

  • Memory Allocations: (measured in bytes per operation) are significantly improved across the board. Through lower heap requirements and lower garbage-collection times, this should lead to higher concurrency and lower overhead, particularly in resource-constrained environments such as Android or high-volume servers.

Screenshot 2024-08-09 at 4 36 08 PM image

(Note that scales are different in each chart)

image image

Overall, this is a strong result from the changes in this branch, and I'll be merging it in. Of course, there are opportunities for further improvement. I'm keen to see what others can find as well!

@jhy
Copy link
Owner Author

jhy commented Aug 9, 2024

Here's a view with the operations and allocations relative to the input file size:

image

We can see that overall, the larger the input size, the more efficient the parse, and there's no elbow as sizes get larger.

@jhy jhy marked this pull request as ready for review August 9, 2024 23:40
@jhy
Copy link
Owner Author

jhy commented Aug 9, 2024

The profiling logs are here: https://gist.github.com/jhy/f6e9c8ca8fd506c6c36f7d1a3154631c

@jhy jhy merged commit ec2d0c4 into master Aug 10, 2024
22 checks passed
@jhy jhy deleted the buffman branch August 10, 2024 00:00
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants