Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PR #7829/2a3eaa11 backport][3.10] Index resources in the UrlDispatcher to avoid linear search for most cases #7917

Conversation

patchback[bot]
Copy link
Contributor

@patchback patchback bot commented Nov 27, 2023

This is a backport of PR #7829 as merged into master (2a3eaa1).

What do these changes do?

The time complexity of the UrlDispatcher on a hit averaged O(n)

The UrlDispatcher now keeps an index of all resources and will attempt to avoid doing a linear search to resolve. On a hit the performance should be nearly O(url_parts) on a miss the time complexity will be O(domains) + O(url_parts)

This works for prefixed subapps as well.

sub apps added by add_domain (aka MatchedSubAppResource) will fallback to linear searching as they are not indexable since they support wildcard/regexs

Are there changes in behavior for the user?

A fixed/static path will always be preferred over a dynamic path.

Related issue number

fixes #7828

benchmark

import asyncio
import timeit
from unittest.mock import MagicMock, AsyncMock

from yarl import URL

from aiohttp import web
from aiohttp.web_urldispatcher import (
    UrlDispatcher,
)

URL_COUNT = 5000


def make_mock_request(url: str) -> web.Request:
    return web.Request(
        MagicMock(url=URL(url), method="GET"),
        MagicMock(),
        protocol=MagicMock(),
        host="example.com",
        task=MagicMock(),
        loop=asyncio.get_running_loop(),
        payload_writer=MagicMock(),
    )


async def url_dispatcher_performance():
    """Filter out large cookies from the cookie jar."""
    dispatcher = UrlDispatcher()
    for i in range(URL_COUNT):
        dispatcher.add_get(f"/static/sub/path{i}", AsyncMock())

    first_url = "/static/sub/path0"
    last_url = f"/static/sub/path{URL_COUNT-1}"
    long_url = "/static/lots/of/sub/path/segments/0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15"
    mid_size_url = "/static/midsize/of/sub/path/segments"
    miss_url = "/is/a/miss"
    long_miss_url = "/is/a/miss/with/lots/of/sub/path/segments/0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15"

    dispatcher.add_get(long_url, AsyncMock())
    dispatcher.add_get(mid_size_url, AsyncMock())

    request_first_url = make_mock_request(first_url)
    match_info = await dispatcher.resolve(request_first_url)
    assert match_info.route.resource.canonical == first_url

    request_last_url = make_mock_request(last_url)
    match_info = await dispatcher.resolve(request_last_url)
    assert match_info.route.resource.canonical == last_url

    request_long_url = make_mock_request(long_url)
    match_info = await dispatcher.resolve(request_long_url)
    assert match_info.route.resource.canonical == long_url

    request_mid_size_url = make_mock_request(mid_size_url)
    match_info = await dispatcher.resolve(request_mid_size_url)
    assert match_info.route.resource.canonical == mid_size_url

    request_miss_url = make_mock_request(miss_url)
    match_info = await dispatcher.resolve(request_miss_url)
    assert match_info.route.status == 404

    request_long_miss_url = make_mock_request(long_miss_url)
    match_info = await dispatcher.resolve(request_long_miss_url)
    assert match_info.route.status == 404

    async def resolve_times(request: web.Request, times: int) -> float:
        start = timeit.default_timer()
        for _ in range(times):
            await dispatcher.resolve(request)
        end = timeit.default_timer()
        return end - start

    run_count = 2000
    resolve_time_first_url = await resolve_times(request_first_url, run_count)
    resolve_time_last_url = await resolve_times(request_last_url, run_count)
    resolve_time_mid_size_url = await resolve_times(request_mid_size_url, run_count)
    resolve_time_long_url = await resolve_times(request_long_url, run_count)
    resolve_time_miss = await resolve_times(request_miss_url, run_count)
    resolve_time_long_miss = await resolve_times(request_long_miss_url, run_count)
    print(f"resolve_time_first_url: {resolve_time_first_url}")
    print(f"resolve_time_last_url: {resolve_time_last_url}")
    print(f"resolve_time_mid_size_url: {resolve_time_mid_size_url}")
    print(f"resolve_time_long_url: {resolve_time_long_url}")
    print(f"resolve_time_miss: {resolve_time_miss}")
    print(f"resolve_time_long_miss: {resolve_time_long_miss}")


asyncio.run(url_dispatcher_performance())

before (5002 urls)

resolve_time_first_url: 0.0016546659753657877
resolve_time_last_url: 2.253484041953925
resolve_time_mid_size_url: 2.1529546250239946
resolve_time_long_url: 2.136738500033971
resolve_time_miss: 2.152159374963958
resolve_time_long_miss: 2.173428333015181

before (7 urls)

resolve_time_first_url: 0.0019221249967813492
resolve_time_last_url: 0.0038797500310465693
resolve_time_mid_size_url: 0.004664750013034791
resolve_time_long_url: 0.004066875029820949
resolve_time_miss: 0.00846966594690457
resolve_time_long_miss: 0.008330333046615124

after (5002 urls)

resolve_time_first_url: 0.0019938750192523003
resolve_time_last_url: 0.0020484999986365438
resolve_time_mid_size_url: 0.002027958049438894
resolve_time_long_url: 0.0019290419877506793
resolve_time_miss: 0.0035507080028764904
resolve_time_long_miss: 0.007968166959472

after (7 urls)

resolve_time_first_url: 0.0019937920151278377
resolve_time_last_url: 0.002007792005315423
resolve_time_mid_size_url: 0.0019652920309454203
resolve_time_long_url: 0.0019985410035587847
resolve_time_miss: 0.003473666962236166
resolve_time_long_miss: 0.007722083013504744

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> for example (588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

@Dreamsorcerer
Copy link
Member

@bdraco Looks like something went bad in the backport, tests are failing and taking forever...

@bdraco
Copy link
Member

bdraco commented Nov 27, 2023

I'll take a look as soon as I can

@bdraco
Copy link
Member

bdraco commented Nov 27, 2023

diff --git a/aiohttp/web_urldispatcher.py b/aiohttp/web_urldispatcher.py
index d13ccfbf..a2cc6563 100644
--- a/aiohttp/web_urldispatcher.py
+++ b/aiohttp/web_urldispatcher.py
@@ -1048,7 +1048,7 @@ class UrlDispatcher(AbstractRouter, Mapping[str, AbstractResource]):
         if allowed_methods:
             return MatchInfoError(HTTPMethodNotAllowed(request.method, allowed_methods))
 
-        return MatchInfoError(self.HTTP_NOT_FOUND)
+        return MatchInfoError(HTTPNotFound())
 
     def __iter__(self) -> Iterator[str]:
         return iter(self._named_resources)

Reverting that seems to fix it.. digging into why its different

@bdraco
Copy link
Member

bdraco commented Nov 27, 2023

In 3.10 HTTPNotFound is a subclass of a Response so it can't be reused

@bdraco
Copy link
Member

bdraco commented Nov 27, 2023

That was changed in c11e891

@bdraco
Copy link
Member

bdraco commented Nov 27, 2023

Another option is to do a very limited backport of c11e891

Co-authored-by: J. Nick Koston <nick@koston.org>
@Dreamsorcerer
Copy link
Member

In 3.10 HTTPNotFound is a subclass of a Response so it can't be reused

Right, they used to be responses, but now are exceptions. 3.x needs to maintain backwards-compatibility.

@bdraco
Copy link
Member

bdraco commented Nov 27, 2023

In 3.10 HTTPNotFound is a subclass of a Response so it can't be reused

Right, they used to be responses, but now are exceptions. 3.x needs to maintain backwards-compatibility.

Limited backport POC #7918 (still testing)

Copy link

codecov bot commented Nov 27, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (aa7b121) 97.35% compared to head (6ce871e) 97.35%.
Report is 2 commits behind head on 3.10.

Files Patch % Lines
aiohttp/web_urldispatcher.py 97.05% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             3.10    #7917   +/-   ##
=======================================
  Coverage   97.35%   97.35%           
=======================================
  Files         108      108           
  Lines       32459    32542   +83     
  Branches     3849     3852    +3     
=======================================
+ Hits        31600    31682   +82     
  Misses        655      655           
- Partials      204      205    +1     
Flag Coverage Δ
CI-GHA 97.26% <98.93%> (+<0.01%) ⬆️
OS-Linux 96.96% <98.93%> (+<0.01%) ⬆️
OS-Windows 94.46% <98.93%> (+<0.01%) ⬆️
OS-macOS 96.76% <98.93%> (+0.19%) ⬆️
Py-3.10.11 94.36% <98.93%> (+0.01%) ⬆️
Py-3.10.13 96.69% <98.93%> (-0.04%) ⬇️
Py-3.11.6 96.46% <98.93%> (+<0.01%) ⬆️
Py-3.12.0 96.39% <98.93%> (?)
Py-3.8.10 94.34% <98.93%> (+<0.01%) ⬆️
Py-3.8.18 96.66% <98.93%> (+<0.01%) ⬆️
Py-3.9.13 94.35% <98.93%> (+0.01%) ⬆️
Py-3.9.18 96.71% <98.93%> (+<0.01%) ⬆️
Py-pypy7.3.13 96.17% <98.93%> (+<0.01%) ⬆️
VM-macos 96.76% <98.93%> (+0.19%) ⬆️
VM-ubuntu 96.96% <98.93%> (+<0.01%) ⬆️
VM-windows 94.46% <98.93%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Dreamsorcerer Dreamsorcerer merged commit 0a75ef0 into 3.10 Nov 27, 2023
27 of 31 checks passed
@Dreamsorcerer Dreamsorcerer deleted the patchback/backports/3.10/2a3eaa11dc2f8e6150eede9337182f273e14c20a/pr-7829 branch November 27, 2023 19:34
@bdraco bdraco mentioned this pull request Jul 22, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants