Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Armeria server performance analaysis #4

Open
ikhoon opened this issue Aug 5, 2020 · 9 comments
Open

Armeria server performance analaysis #4

ikhoon opened this issue Aug 5, 2020 · 9 comments

Comments

@ikhoon
Copy link
Collaborator

ikhoon commented Aug 5, 2020

A benchmark and performance analysis would be good data to advocate Armeria server to http4s' users.

@rossabaker
Copy link
Member

@hamnis set up a server-example project, which we've used to run some crude benchmarks between the various backends. It doesn't show off much yet, but it gives us a baseline for pings.

@ikhoon
Copy link
Collaborator Author

ikhoon commented Aug 30, 2020

1st Benchmark:

  • Environment: MacBook Pro, 2.2 GHz 6-Core Intel Core i7
  • Benchmark code: hamnis/http4s-server-example@0cfeee3
  • Result
    • http4s-armeria
      $ wrk -t12 -c400 -d30s http://127.0.0.1:8080/hello
      
      Running 30s test @ http://127.0.0.1:8080/hello
        12 threads and 400 connections
        Thread Stats   Avg      Stdev     Max   +/- Stdev
          Latency    11.76ms    2.97ms  86.66ms   89.53%
            Req/Sec     1.71k     0.94k    3.64k    52.50%
        611170 requests in 30.02s, 107.62MB read
        Socket errors: connect 155, read 170, write 0, timeout 0
      Requests/sec:  20358.15
      Transfer/sec:      3.58MB
      
    • Blaze
      $ wrk -t12 -c400 -d30s http://127.0.0.1:8080/hello
      Running 30s test @ http://127.0.0.1:8080/hello
        12 threads and 400 connections
        Thread Stats   Avg      Stdev     Max   +/- Stdev
          Latency     2.94ms    2.19ms 107.06ms   96.86%
          Req/Sec     6.99k     3.06k   14.33k    53.22%
        2506706 requests in 30.03s, 351.81MB read
        Socket errors: connect 155, read 277, write 0, timeout 0
      Requests/sec:  83481.35
      Transfer/sec:     11.72MB
      
    • Plain Armeria
      $ wrk -t12 -c400 -d30s http://127.0.0.1:8080/hello
      
      Running 30s test @ http://127.0.0.1:8080/hello
        12 threads and 400 connections
        Thread Stats   Avg      Stdev     Max   +/- Stdev
          Latency     3.16ms  450.83us  28.96ms   87.82%
          Req/Sec     6.37k     3.29k   13.52k    47.94%
        2280593 requests in 30.01s, 401.54MB read
        Socket errors: connect 155, read 142, write 0, timeout 0
      Requests/sec:  75992.98
      Transfer/sec:     13.38MB
      
  • Bottleneck
    • http4s-armeria consumes all CPU cycles, it hits 100%
    • http4s-blaze hits: 50~60%
  • Analysis
    • The performance(Requests/sec) of http4s-armeria was increased 3~4 times by removing toUnicastPublisher operation which converts Stream[F, HttpObject] to Publihser[HttpObject]
  • Conclusion
    • We cannot release the initial version of http4s-armeria until solving the performance problem on the high CPU utilization.
    • We might need an obtimized converter for Reactive Streams?
    • ???

ikhoon added a commit to ikhoon/http4s-armeria that referenced this issue Aug 31, 2020
…Publisher`

Motivation:

The conversion between `fs2.Stream` and Reactive Streams `Publihser`
is one of the bottlecks in that benchmark of http4s#4

Modifications:

- Write on demand instead of `Publisher`

Result:

- Before
  ```
  Running 30s test @ http://127.0.0.1:8080/hello
    12 threads and 400 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency    11.76ms    2.97ms  86.66ms   89.53%
        Req/Sec     1.71k     0.94k    3.64k    52.50%
    611170 requests in 30.02s, 107.62MB read
    Socket errors: connect 155, read 170, write 0, timeout 0
  Requests/sec:  20358.15
  Transfer/sec:      3.58MB
  ```
- After
  ```
  Running 30s test @ http://127.0.0.1:8080/http4s/thread
    12 threads and 400 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency     3.77ms    2.41ms 145.85ms   96.60%
      Req/Sec     5.39k     2.88k   59.63k    60.04%
    1932651 requests in 30.10s, 312.64MB read
    Socket errors: connect 155, read 168, write 0, timeout 0
  Requests/sec:  64207.03
  Transfer/sec:     10.39MB

  Running 30s test @ http://127.0.0.1:8080/http4s/thread
    12 threads and 400 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency     3.92ms    1.30ms  63.05ms   95.09%
      Req/Sec     5.11k     2.48k   10.36k    61.56%
    1831547 requests in 30.03s, 296.28MB read
    Socket errors: connect 155, read 160, write 0, timeout 0
  Requests/sec:  60989.03
  Transfer/sec:      9.87MBow it will be ignored.
  ```
ikhoon added a commit that referenced this issue Sep 2, 2020
…Publisher` (#8)

Motivation:

The conversion between `fs2.Stream` and Reactive Streams `Publisher`
is one of the bottlenecks in that benchmark of #4

Modifications:

- Write on demand instead of `Publisher`

Result:

- Before
  ```
  Running 30s test @ http://127.0.0.1:8080/hello
    12 threads and 400 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency    11.76ms    2.97ms  86.66ms   89.53%
        Req/Sec     1.71k     0.94k    3.64k    52.50%
    611170 requests in 30.02s, 107.62MB read
    Socket errors: connect 155, read 170, write 0, timeout 0
  Requests/sec:  20358.15
  Transfer/sec:      3.58MB
  ```
- After
  ```
  Running 30s test @ http://127.0.0.1:8080/http4s/thread
    12 threads and 400 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency     3.77ms    2.41ms 145.85ms   96.60%
      Req/Sec     5.39k     2.88k   59.63k    60.04%
    1932651 requests in 30.10s, 312.64MB read
    Socket errors: connect 155, read 168, write 0, timeout 0
  Requests/sec:  64207.03
  Transfer/sec:     10.39MB

  Running 30s test @ http://127.0.0.1:8080/http4s/thread
    12 threads and 400 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency     3.92ms    1.30ms  63.05ms   95.09%
      Req/Sec     5.11k     2.48k   10.36k    61.56%
    1831547 requests in 30.03s, 296.28MB read
    Socket errors: connect 155, read 160, write 0, timeout 0
  Requests/sec:  60989.03
  Transfer/sec:      9.87MBow it will be ignored.
  ```
@ngbinh
Copy link

ngbinh commented Feb 2, 2021

Is it resolved? Congrats on the new release!

@ikhoon
Copy link
Collaborator Author

ikhoon commented Feb 2, 2021

Is it resolved?

I think there is a room we can optimize http4s-armeria performance. 💪

Loads of the benchmark were a simple "hello world" text message with HTTP/1.1.
Because of the simple load, the bottleneck point which mostly consumes CPU was a conversion from Reactive Streams to fs2 Stream and vise versa. fs2-reactive-streams is used for the conversion.

I only got rid of a conversion that converts from fs2 streams to Reactive Streams for HttpResponse.
Another remaining conversion is HttpRequest(Reactive Streams) to FS2 streams using fs2-reactive-streams.
I am going to remove fs2-reactive-streams dependency and implement an optimized version for fs2.

@ngbinh
Copy link

ngbinh commented Feb 2, 2021

Thanks for the explanation. Looking forward to it!

@andyczerwonka
Copy link

Is it resolved?

I think there is a room we can optimize http4s-armeria performance. 💪

Loads of the benchmark were a simple "hello world" text message with HTTP/1.1. Because of the simple load, the bottleneck point which mostly consumes CPU was a conversion from Reactive Streams to fs2 Stream and vise versa. fs2-reactive-streams is used for the conversion.

I only got rid of a conversion that converts from fs2 streams to Reactive Streams for HttpResponse. Another remaining conversion is HttpRequest(Reactive Streams) to FS2 streams using fs2-reactive-streams. I am going to remove fs2-reactive-streams dependency and implement an optimized version for fs2.

@ikhoon was the optimization to fs2 done?

@ikhoon
Copy link
Collaborator Author

ikhoon commented Sep 16, 2022

It was partially done and is still in progress. But there were many performance optimizations on the Armeria side.
I strongly believe that http4s-armeria can be used in production.

@danicheg
Copy link
Member

@ikhoon It's been two years since you have done the benchmark. It'd be nice if you can run it with the new http4s-armeria/blaze. I believe some numbers must change in the benchmark results.

@ikhoon
Copy link
Collaborator Author

ikhoon commented Sep 19, 2022

Agreed. Benchmark results might be changed somehow. I will run benchmarks soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants