Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.1.x] Limit memory used while fetching many partitions #11858

Merged
merged 11 commits into from
Jul 11, 2023

Commits on Jun 8, 2023

  1. config: support nonintegral bound properties

    `numeric_bounds<T>` implied the argument to be an integral numeric
    by requiring the `%` operation on it. This change renames `numeric_bounds`
    into `numeric_integral_bounds` to emphasize that, and introduces
    the `numeric_bounds` that does not support alignments and odd/even checks,
    and thus works with floating point types too.
    
    The `bounded_property` now can accept an arbitrary bounds struct that
    conforms to `detail::bounds<>` concept. For compatibility purpose,
    it defaults to `numeric_integral_bounds` so no code change is necessary.
    
    (cherry picked from commit 1db2edc)
    dlex committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    520759f View commit details
    Browse the repository at this point in the history
  2. config: a property to control memory allocation for parallel fetch

    (cherry picked from commit ee5e069)
    dlex committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    14ccc78 View commit details
    Browse the repository at this point in the history
  3. k/fetch: respect the max_bytes from the fetch request

    While limiting the number of partitions in fetch response by
    `kafka_max_bytes_per_fetch`, also consider the fetch plan's `bytes_left`
    which is based on on fetch request's max_bytes and on `fetch_max_bytes`
    property.
    
    (cherry picked from commit eb8a915)
    dlex committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    b30b35b View commit details
    Browse the repository at this point in the history
  4. k/server: kafka::server made a peering_sharded_service

    Functions down the fetch code path will need access to the local
    kafka::server instance members like memory semaphores.
    
    (cherry picked from commit d891efe)
    dlex committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    3edec15 View commit details
    Browse the repository at this point in the history
  5. tests/fixture: initialize server configuration

    fix uninitialized max_service_memory_per_core, also disable metrics
    
    (cherry picked from commit 58a9de0)
    dlex committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    33b5f9e View commit details
    Browse the repository at this point in the history
  6. k/server: kafka memory fetch semaphore

    Kafka server now stores (per shard) memory semaphore that will limit
    memory usage by fetch request handler. Semaphore count is configured
    based on the "kafka_memory_share_for_fetch" property and the kafka
    rpc service memory size.
    
    Metric `vectorized_kafka_rpc_fetch_avail_mem_bytes` added to control
    the semaphore level.
    
    There is a sharded `server` accessor in `request_context` to reach
    the local shard instance of the new semaphore, as well as the local
    instance of `net::memory` semaphore.
    
    (cherry picked from commit 7b38601)
    dlex committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    80fc596 View commit details
    Browse the repository at this point in the history
  7. config: a property for memory batch size estimation

    (cherry picked from commit c1d77cd)
    dlex committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    580a2dc View commit details
    Browse the repository at this point in the history
  8. k/fetch: limit fetch ntp parallelism

    Consult with memory semaphores on whether there is enough memory available
    to perform the fetch while concurrently fetching from ntps. Both general
    kafka memory semaphore, and the new kafka fetch memory semaphores
    are used. With the former one, the amount consumed from it by request
    memory estimator is considered.
    
    Since batch size is not known ahead, it is estimated at 1 MiB. The first
    partition in the list is fetched regardless of the semaphores values, to
    satisfy the requirement that at least a signle partition from the
    fetch request must advance.
    
    The amount of units held is adjusted to the actual size used as soon as
    it is known.
    
    The acquired units of the memory semaphores are held with `read_result`
    until it is destroyed at the end of the fetch request processing. When
    `read_result` is destroyed in the connection shard, the semaphore units
    are returned in the shard where they have been acquired.
    
    If request's max_size bytes is more than either semaphore holds,
    max_size is reduced to the memory actually available, also considering
    the minimum batch size.
    
    (cherry picked from commit 2474ef6)
    dlex committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    3a87ba3 View commit details
    Browse the repository at this point in the history
  9. net: fix uninitialized member of server_configuration

    In kafka_server_rpfixture, an extra `kafka::server` is created using
    a barely initialized `server_configuration` instance. A garbage in
    `max_service_memory_per_core` has caused issues now because of the
    new arithmetics done with in in the kafka::server ctor.
    
    (cherry picked from commit 623e613)
    dlex committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    b93d79f View commit details
    Browse the repository at this point in the history

Commits on Jul 4, 2023

  1. k/tests: UT for the memory limiting algo

    Test the algorithm that decides whether can a fetch request proceed
    in an ntp based on the resources available.
    
    Move the existing testing-only symbols into the `testing` ns.
    
    (cherry picked from commit 950abc7)
    dlex authored and BenPope committed Jul 4, 2023
    Configuration menu
    Copy the full SHA
    1f1ed9c View commit details
    Browse the repository at this point in the history
  2. tests: enable test_fetch_with_many_partitions

    RAM increased to 512M because redpanda was failing on 256M for unrelated
    reasons.
    
    Test with different values for "kafka_memory_share_for_fetch".
    
    (cherry picked from commit 06a38b9)
    dlex authored and BenPope committed Jul 4, 2023
    Configuration menu
    Copy the full SHA
    8897903 View commit details
    Browse the repository at this point in the history