Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Update Apache HttpClient to version 5.x #372

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jowerner
Copy link
Contributor

@jowerner jowerner commented Jul 29, 2021

Sneak preview of the changes made so far. Note that this PR is not complete, but work-in-progress. See below for some comments from my side.

HttpWebConnection

This is of course the place where most of the changes were made. There are still a lot of TODOs:

  • Have each and every setting configured correctly. I'm still busy finding all the now-needed builders.
  • Find out how to build a multi-part request body the "new way".
  • Find out how to correctly invalidate entries in the authentication cache if the credentials have changed.
  • For now, the code uses a SimpleHttpResponse which buffers the response in memory. Once everything works okay, we can rework this to use a streaming response. Need to find out how this works, though.
  • Previously, HU handled compressed responses itself. Not sure yet if we can keep this feature. Looks like HC 5.x wants to do that.
  • When the dust has settled, get the code back in shape (refactoring, renaming, formatting, comments).
  • The reconfiguration of the HTTP client on changes of the WebClient options is not addressed yet. Needs to be redone when everything else works.
  • Get all existing unit tests green again.
  • HTTP/2 is not covered yet in the unit tests. Hopefully, browsers do not behave that much differently when using HTTP/2 compared with HTTP/1.1.
  • Maybe add another option to WebClientOptions to enforce either HTTP/1.1 or HTTP/2 instead of using the auto-negotiate default?

Other Changes

pom.xml
Still uses HC 4.x for the tests. I tried to rework the LocalTestServer to use 5.x classes, but some of them are no longer available on Maven Central. Maybe we can replace this later. I also had to add httplient as test dependeny since the tests-jar needs it.
The client is no longer required

DefaultCredentialsProvider
Now implements CredentialsStore. The ANY_* constants are no longer provided so I recreated them here for now. AuthScope takes protocol and auth scheme as additional parameters now. Passwords are char[] now.

HtmlUnitBrowserCompatCookieHeaderValueFormatter
We can no longer overwrite certain methods of BasicHeaderValueFormatter so I had to reimplement this class completely, i.e. I copied the relevant code.

HtmlUnitBrowserCompatCookieSpec
There are a couple of changes here and some TODOs. In HC 5.x, cookies don't provide version and comment any longer. Since the current code depends on the version attribute, we have to carefully review/rework this class.

NetscapeDraftHeaderParser
This class is no longer available in HC 5.x. I readded it as HtmlUnitBrowserCompatCookieSpec relies heavily on it at the moment. We need to check if all copyright requirements are satisfied.

HtmlUnitVersionAttributeHandler
Since the version cookie attribute is no longer supported, we can probably delete this class.
Removed

HtmlUnitSSLConnectionSocketFactory / SocksConnectionSocketFactory
Cannot set plain/SSL socket factories any longer. Most functionality has been moved to HttpWebConnection. Once we are confident that all is covered elsewhere, we can remove these classes.

CookieManager3Test
The DefaultCookieSpec used in the test previously is no longer available. Don't know if we still need the httpClientParsesCookiesQuotedValuesCorrectly test case.

HtmlUnitCookieStoreTest
Had to implement the store-contains-the-cookie check in a different way as we can no longer inherit from the now final BasicClientCookie class.

HttpWebConnection3Test
HC 5.x now uses "Connection: keep-alive" (lower-case) so I had to adapt the expectations.

The remaining modifications are mostly trivial.

@rbri
Copy link
Member

rbri commented Aug 4, 2021

Have opened a separate kanban board for this https://github.com/HtmlUnit/htmlunit/projects/2

@rbri
Copy link
Member

rbri commented Dec 5, 2021

Many thanks for this huge contribution. Finally i found some time to work on this.
Have done some preparation in the existing code to at least centralize the httpclient usage a bit. Maybe we can do more here in the future to be able to use different clients (e.g. OkHttp).
Your changes are now merged into a separate branch and we have a jenkins build for this branch also.Hope we can improve the branch in the next weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants