Skip to content

Commit

Permalink
Merge branch 'master' into Element-stream
Browse files Browse the repository at this point in the history
  • Loading branch information
Isira-Seneviratne committed Aug 19, 2024
2 parents cb74941 + 4690661 commit 666c27a
Show file tree
Hide file tree
Showing 75 changed files with 2,950 additions and 852 deletions.
7 changes: 6 additions & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,13 @@ updates:
schedule:
interval: weekly
ignore:
# Jetty 9.x needed for JDK8 compatibility; it still receives security updates
# Jetty 9.x needed for JDK8 compatibility; it still receives security updates. Only used in tests.
- dependency-name: "org.eclipse.jetty:jetty-server"
update-types: ["version-update:semver-major"]
- dependency-name: "org.eclipse.jetty:jetty-servlet"
update-types: ["version-update:semver-major"]

- package-ecosystem: github-actions
directory: /
schedule:
interval: weekly
6 changes: 2 additions & 4 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
name: Build
on:
push:
branches:
- master
pull_request:

jobs:
Expand All @@ -20,10 +18,10 @@ jobs:
uses: actions/checkout@v4

- name: Set up JDK ${{ matrix.java }}
uses: actions/setup-java@v3
uses: actions/setup-java@v4
with:
java-version: ${{ matrix.java }}
distribution: 'temurin'
distribution: 'zulu'
cache: 'maven'

- name: Maven Compile
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/cifuzz.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
dry-run: false
language: jvm
- name: Upload Crash
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
if: failure() && steps.build.outcome == 'success'
with:
name: artifacts
Expand Down
10 changes: 4 additions & 6 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ on:
branches:
- master
pull_request:
schedule:
- cron: '0 5 * * 3'

jobs:
codeql:
Expand All @@ -16,17 +14,17 @@ jobs:
- name: Checkout
uses: actions/checkout@v4
- name: Set up JDK
uses: actions/setup-java@v3
uses: actions/setup-java@v4
with:
java-version: 17
distribution: 'temurin'
cache: 'maven'
- name: CodeQL Initialization
uses: github/codeql-action/init@v2
uses: github/codeql-action/init@v3
with:
languages: java
queries: +security-and-quality
- name: Autobuild
uses: github/codeql-action/autobuild@v2
uses: github/codeql-action/autobuild@v3
- name: CodeQL Analysis
uses: github/codeql-action/analyze@v2
uses: github/codeql-action/analyze@v3
67 changes: 66 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,80 @@
# jsoup Changelog

## 1.18.1 (Pending)
## 1.18.2 (Pending)

### Improvements

* Optimized the throughput and memory use throughout the input read and parse flows, with heap allocations and GC
down between -6% and -89%, and throughput improved up to +143% for small inputs. Most inputs sizes will see
throughput increases of ~ 20%. These performance improvements come through recycling the backing byte[] and char[]
arrays used to read and parse the input. [2186](https://github.com/jhy/jsoup/pull/2186)
* Speed optimized `html()` and `Entities.escape()` when the input contains UTF characters in a supplementary plane, by
around 49%. [2183](https://github.com/jhy/jsoup/pull/2183)
* The form associated elements returned by `FormElement.elements()` now reflect changes made to the DOM,
subsequently to the original parse. [2140](https://github.com/jhy/jsoup/issues/2140)
* In the `TreeBuilder`, the `onNodeInserted()` and `onNodeClosed()` events are now also fired for the outermost /
root `Document` node. This enables source position tracking on the Document node (which was previously unset). And
it also enables the node traversor to see the outer Document node. [2182](https://github.com/jhy/jsoup/pull/2182)

### Bug Fixes

* `Element.cssSelector()` would fail if the element's class contained a `*`
character. [2169](https://github.com/jhy/jsoup/issues/2169)
* When tracking source ranges, a text node following an invalid self-closing element may be left
untracked.[2175](https://github.com/jhy/jsoup/issues/2175)

## 1.18.1 (2024-Jul-10)

### Improvements

* **Stream Parser**: A `StreamParser` provides a progressive parse of its input. As each `Element` is completed, it is
emitted via a `Stream` or `Iterator` interface. Elements returned will be complete with all their children, and an
(empty) next sibling, if applicable. Elements (or their children) may be removed from the DOM during the parse,
for e.g. to conserve memory, providing a mechanism to parse an input document that would otherwise be too large to fit
into memory, yet still providing a DOM interface to the document and its elements. Additionally, the parser provides
a `selectFirst(String query)` / `selectNext(String query)`, which will run the parser until a hit is found, at which
point the parse is suspended. It can be resumed via another `select()` call, or via the `stream()` or `iterator()`
methods. [2096](https://github.com/jhy/jsoup/pull/2096)
* **Download Progress**: added a Response Progress event interface, which reports progress and URLs are downloaded (and
parsed). Supported on both a session and a single connection
level. [2164](https://github.com/jhy/jsoup/pull/2164), [656](https://github.com/jhy/jsoup/issues/656)
* Added `Path` accepting parse methods: `Jsoup.parse(Path)`, `Jsoup.parse(path, charsetName, baseUri, parser)`,
etc. [2055](https://github.com/jhy/jsoup/pull/2055)
* Updated the `button` tag configuration to include a space between multiple button elements in the `Element.text()`
method. [2105](https://github.com/jhy/jsoup/issues/2105)
* Added support for the `ns|*` all elements in namespace Selector. [1811](https://github.com/jhy/jsoup/issues/1811)
* When normalising attribute names during serialization, invalid characters are now replaced with `_`, vs being
stripped. This should make the process clearer, and generally prevent an invalid attribute name being coerced
unexpectedly. [2143](https://github.com/jhy/jsoup/issues/2143)

### Changes

* Removed previously deprecated internal classes and methods. [2094](https://github.com/jhy/jsoup/pull/2094)
* Build change: the built jar's OSGi manifest no longer imports itself. [2158](https://github.com/jhy/jsoup/issues/2158)

### Bug Fixes

* When tracking source positions, if the first node was a TextNode, its position was incorrectly set
to `-1.` [2106](https://github.com/jhy/jsoup/issues/2106)
* When connecting (or redirecting) to URLs with characters such as `{`, `}` in the path, a Malformed URL exception would
be thrown (if in development), or the URL might otherwise not be escaped correctly (if in
production). The URL encoding process has been improved to handle these characters
correctly. [2142](https://github.com/jhy/jsoup/issues/2142)
* When using `W3CDom` with a custom output Document, a Null Pointer Exception would be
thrown. [2114](https://github.com/jhy/jsoup/pull/2114)
* The `:has()` selector did not match correctly when using sibling combinators (like
e.g.: `h1:has(+h2)`). [2137](https://github.com/jhy/jsoup/issues/2137)
* The `:empty` selector incorrectly matched elements that started with a blank text node and were followed by
non-empty nodes, due to an incorrect short-circuit. [2130](https://github.com/jhy/jsoup/issues/2130)
* `Element.cssSelector()` would fail with "Did not find balanced marker" when building a selector for elements that had
a `(` or `[` in their class names. And selectors with those characters escaped would not match as
expected. [2146](https://github.com/jhy/jsoup/issues/2146)
* Updated `Entities.escape(string)` to make the escaped text suitable for both text nodes and attributes (previously was
only for text nodes). This does not impact the output of `Element.html()` which correctly applies a minimal escape
depending on if the use will be for text data or in a quoted
attribute. [1278](https://github.com/jhy/jsoup/issues/1278)
* Fuzz: a Stack Overflow exception could occur when resolving a crafted `<base href>` URL, in the normalizing regex.
[2165](https://github.com/jhy/jsoup/issues/2165)

---

Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
The MIT License

Copyright (c) 2009-2023 Jonathan Hedley <https://jsoup.org/>
Copyright (c) 2009-2024 Jonathan Hedley <https://jsoup.org/>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
51 changes: 20 additions & 31 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.18.1-SNAPSHOT</version><!-- remember to update previous version below for japicmp -->
<version>1.18.2-SNAPSHOT</version><!-- remember to update previous version below for japicmp -->
<url>https://jsoup.org/</url>
<description>jsoup is a Java library that simplifies working with real-world HTML and XML. It offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM API methods, CSS, and xpath selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers.</description>
<inceptionYear>2009</inceptionYear>
Expand Down Expand Up @@ -33,7 +33,7 @@

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<jetty.version>9.4.53.v20231009</jetty.version>
<jetty.version>9.4.55.v20240627</jetty.version>
</properties>

<build>
Expand All @@ -42,7 +42,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.12.1</version>
<version>3.13.0</version>
<configuration>
<encoding>UTF-8</encoding>
<compilerArgs>
Expand All @@ -68,7 +68,7 @@
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>animal-sniffer-maven-plugin</artifactId>
<version>1.23</version>
<version>1.24</version>
<executions>
<execution>
<id>animal-sniffer</id>
Expand Down Expand Up @@ -105,6 +105,7 @@
<ignore>java.util.Set</ignore> <!-- Set#stream() -->
<ignore>java.util.Spliterator</ignore>
<ignore>java.util.Spliterators</ignore>
<ignore>java.nio.ByteBuffer</ignore> <!-- .flip(); added in API1; possibly due to .flip previously returning Buffer, later ByteBuffer; return unused -->

<ignore>java.net.HttpURLConnection</ignore><!-- .setAuthenticator(java.net.Authenticator) in Java 9; only used in multirelease 9+ version -->
</ignores>
Expand All @@ -117,7 +118,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.6.3</version>
<version>3.8.0</version>
<configuration>
<doclint>none</doclint>
<source>8</source>
Expand All @@ -135,7 +136,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>3.3.0</version>
<version>3.3.1</version>
<configuration>
<excludes>
<exclude>org/jsoup/examples/**</exclude>
Expand All @@ -153,7 +154,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>3.3.0</version>
<version>3.4.2</version>
<configuration>
<archive>
<manifest>
Expand Down Expand Up @@ -186,7 +187,7 @@
<instructions>
<Bundle-DocURL>https://jsoup.org/</Bundle-DocURL>
<Export-Package>org.jsoup.*</Export-Package>
<Import-Package>org.jspecify.annotations;version=!;resolution:=optional,*</Import-Package>
<Import-Package>!org.jsoup.*,org.jspecify.annotations;version=!;resolution:=optional,*</Import-Package>
</instructions>
</configuration>
</plugin>
Expand All @@ -197,20 +198,20 @@
</plugin>
<plugin>
<artifactId>maven-release-plugin</artifactId>
<version>3.0.1</version>
<version>3.1.1</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.2.3</version>
<version>3.3.1</version>
<configuration>
<!-- smaller stack to find stack overflows -->
<argLine>-Xss256k</argLine>
<!-- smaller stack to find stack overflows. Was 256, but Zulu on MacOS ARM needs >= 640 -->
<argLine>-Xss640k</argLine>
</configuration>
</plugin>
<plugin>
<artifactId>maven-failsafe-plugin</artifactId>
<version>3.2.3</version>
<version>3.3.1</version>
<executions>
<execution>
<goals>
Expand All @@ -228,14 +229,14 @@
<!-- API version compat check - https://siom79.github.io/japicmp/ -->
<groupId>com.github.siom79.japicmp</groupId>
<artifactId>japicmp-maven-plugin</artifactId>
<version>0.18.3</version>
<version>0.22.0</version>
<configuration>
<!-- hard code previous version; can't detect when running stateless on build server -->
<oldVersion>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.17.1</version>
<version>1.18.1</version>
<type>jar</type>
</dependency>
</oldVersion>
Expand All @@ -260,18 +261,6 @@
<binaryCompatible>true</binaryCompatible>
<sourceCompatible>true</sourceCompatible>
</overrideCompatibilityChangeParameter>

<!--
One off, getting a spurious ping on adding [<T extends Node> Stream<T> nodeStream(Class<T> class)] to Node.
Manually verified binary & source compatibility
todo: remove after 1.17.1 release
-->
<overrideCompatibilityChangeParameter>
<compatibilityChange>CLASS_GENERIC_TEMPLATE_CHANGED</compatibilityChange>
<binaryCompatible>true</binaryCompatible>
<sourceCompatible>true</sourceCompatible>
</overrideCompatibilityChangeParameter>

</overrideCompatibilityChangeParameters>
</parameter>
</configuration>
Expand Down Expand Up @@ -383,7 +372,7 @@
<plugins>
<plugin>
<artifactId>maven-failsafe-plugin</artifactId>
<version>3.2.3</version>
<version>3.3.1</version>
<executions>
<execution>
<goals>
Expand All @@ -404,15 +393,15 @@
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>5.10.1</version>
<version>5.10.3</version>
<scope>test</scope>
</dependency>

<dependency>
<!-- gson, to fetch entities from w3.org -->
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.10.1</version>
<version>2.11.0</version>
<scope>test</scope>
</dependency>

Expand Down Expand Up @@ -444,7 +433,7 @@
<!-- org.jspecify.annotations.nonnull, with Apache 2 license. Build time only. -->
<groupId>org.jspecify</groupId>
<artifactId>jspecify</artifactId>
<version>0.3.0</version>
<version>1.0.0</version>
<scope>provided</scope>
</dependency>
</dependencies>
Expand Down
Loading

0 comments on commit 666c27a

Please sign in to comment.