Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A proposal for a network Resource Manager #1260

Closed
vyzo opened this issue Dec 10, 2021 · 16 comments
Closed

A proposal for a network Resource Manager #1260

vyzo opened this issue Dec 10, 2021 · 16 comments
Assignees
Labels
kind/discussion Topical discussion; usually not changes to codebase

Comments

@vyzo
Copy link
Contributor

vyzo commented Dec 10, 2021

Network Resource Management

The Problem

Our networking stack is currently a black hole in terms of resource
management; connections, streams, buffers and so on can consume
arbitrary resources with very little visibility.

In addition, an often requested feature is the ability to set hard
limits in terms of resources (esp. connections) that can be consumed
by the network stack, which we are not currently equipped to do.

Here we propose a network Resource Manager interface that can be
plugged at the bowels of the stack (transports and multiplexers) and
provide for the necessary resource accounting and limits.

Network Resources and Limits

The main entry point for resource consumption in the network stack is
the Connection. Each connection is associated with a peer, which may
have many active connnections. Associated with a connection may be
buffers and open streams for some protocol(s). Further down the
rabbit hole, streams have associated buffers.

This suggests a natural hierarchy for resource management:

  • Peer
    • Connections
      • Buffers
      • Streams
        • Buffers

This hierarchy creates an entry point for setting limits at the peer
level: We can limit per peer the number of connections it can have,
and further down the number of streams on each connection and on
aggregate for the peer. Similarly we can set a limit on how much
buffer space a peer can consume, with higher resolution on limiting
the buffer space consumed by each connection and each stream.

On top of per peer limits, the system may impose global hard limits
that restrain the aggregate resource usage. We may also want to set
per protocol limits globally.

Transactional Scopes for Resource Management

In order to make resource management ergonomic, we propose the concept
of transactional scopes for aggregate resource management. Starting
with a peer scope, we create a transactional scope for each connection
and similarly a transactional scope for each stream. Within we account
and limit resource usage, and once the top level scope is closed, then
all resources down the tree will also be released.

A potential subtle point is the association of a connection scope with
a peer; prior to the handshake in an incoming connection, we do not
yet know the peer. That implies that a connection scope can be created
without initially being associated with a peer, with the association
established later.

Using transactional scopes, we can sketch an interface for the resource manager as follows:

type ResourceManager interface {
   // open or get an existing peer scope
   GetPeerScope(peer.ID) (PeerScope, error)
   // directly request a scope for an incoming connection
   GetConnectionScope(Direction) (ConnectionScope, error)
}

type TransactionalScope interface {
  // End the transaction and close the scope; resources are reclaimed.
  Done()

  // Allocate and track a buffer in this scope
  GetBuffer(size int) ([]byte, error)
  // Grow a previously allocated buffer
  GrowBuffer([]byte, newsize int) ([]byte, error)
  // early release; Done automatically reclaims allocated buffers.
  ReleaseBuffer([]byte)
}

type PeerScope interface {
  TransactionalScope

  Peer() peer.ID
  Stat() PeerStat

  // open a new connection scope
  GetConnectionScope(Direction) (ConnectionScope, error)
  // attach an incoming connection to a scope
  AddConnectionScope(ConnectionScope) error
}

type ConnectionScope interface {
  TransactionalScope

  GetStreamScope(Direction, proto.ID) (StreamScope, error)

  PeerScope() PeerScope
  Stat() ConnectionStat
}

type StreamScope interface {
  TransactionScope

  ConnectionScope() ConnectionScope
  Stat() StreamStat
}

var ErrResourceLimitExceeded = ...
var ErrScopeClosed = ...

Notes

  • We may want to also introduce protocol scopes, that contain peer scopes if we want to control
    application protocol limits in a granular way.
  • The TransactionalScope interface implies that buffers are allocated (from the buffer pool)
    through the scope and not directly. We may want to do this indirectly by recording/vetting the
    buffer size instead of the actual buffer.
@vyzo vyzo added the kind/discussion Topical discussion; usually not changes to codebase label Dec 10, 2021
@marten-seemann
Copy link
Contributor

Linking this old issue for discoverability / visibility: #635.

@marten-seemann
Copy link
Contributor

File descriptors are another resource we should probably track, as suggested by @Stebalien here: libp2p/go-libp2p-swarm#165.

@vyzo
Copy link
Contributor Author

vyzo commented Dec 14, 2021

yeah, good point.

@Stebalien
Copy link
Member

I think this is a great approach but I'd like to question the hierarchy:

  1. I don't really care how much memory a stream uses, I care about how much memory a peer uses.
  2. I don't really care if a peer has one connection with 1999 streams and another with 1.
  3. Etc...

Really, I have:

  1. An application that shares resources with other programs.
  2. Services within the application that share resources.
  3. Peers sharing resources (usually tied to a service).
App -> Services -> Peers
  \        \         \
   \--------\---------\-------> [ Resource Limits ]

I want to be able to configure constraints like:

  1. My system won't use more than N file descriptors or X memory.
  2. Bitswap won't use more than X memory (or some percent of the system limit, or some function of load).
  3. Peer P can use X/Y/Z memory/conns/streams/fds with respect to my/no/all service.

@Stebalien
Copy link
Member

In terms of implementation:

We'd need a DMZ service and peer for "unknown" resources. Basically a holding ground where we have low limits and short timeouts.

  1. Resources, peers, etc. would initially be allocated to the DMZ service.
  2. If a one or more services tag (or somehow claim) a peer, the peer's connection "resources" would now be shared by these services.
  3. A new inbound stream would initially belong to the DMZ service until negotiated. Ideally registering a listener for a new stream would indicate the service responsible for the resource. Unfortunately, we don't currently have a context here...
  4. New outbound streams would immediately be debited to the service creating them (we can use the context for this).

@Stebalien
Copy link
Member

Nit: I'd replace GetBuffer(size) []byte and friends with Reserve(size) and Release(size). I would definitely provide helper methods for allocating actual buffers, using a buffer pool, etc., but (a) not everything is a buffer and (b) some services may need to manage their own memory for some reason.

@vyzo
Copy link
Contributor Author

vyzo commented Dec 16, 2021 via email

@vyzo
Copy link
Contributor Author

vyzo commented Dec 16, 2021

Basically what we need to model here is the application/protocol as a trait.
A peer/connection/stream may belong to multiple app/protocol scopes and we want to set limits at that level.
The DMZ is the default scope with some preset limits.

@vyzo
Copy link
Contributor Author

vyzo commented Dec 16, 2021

I think the right way to model this is the following graph:

Protocol           Peer
       |               |
       |               +------>*  Conn
       |                             |
       +----------------‐-------------+----->* Stream

Where the protocol is a set of applicable proto IDs or a default scope.
Then we have limits applied per protocol, per peer, per conn, and per stream.
Turtles all the way basic.

@Stebalien
Copy link
Member

Then we have limits applied per protocol, per peer, per conn, and per stream.

I still want to make sure we have a strong motivation for this hierarchy:

  1. Do we need limits per conn/stream? What user need is driving this? I guess I could see a use for per-stream limits (specifically, a service could raise the limit on some high-bandwidth streams). But I can't think of a use-case for a per-connection limit that isn't covered by a per-peer limit.
  2. With respect to protocol names, multiple services may end up using the same protocol for outbound streams in some cases. I really would make this about individual services and not protocols if at all possible.

@vyzo
Copy link
Contributor Author

vyzo commented Dec 16, 2021

Re 2: For services, I think we can add another scope, or replace the protocol scope with it. An argument could be made that we need both (see below).

Re 1: For the limit hierarchy, it is easier to control resources without the user doing anything.

@Stebalien
Copy link
Member

For the limit hierarchy, it is easier to control resources without the user doing anything.

Sure, but we don't need per-connection limits in that case. We'd need:

  1. Per-stream (and/or per-stream/per-peer) limits.
  2. Per-peer limits.

I'm just challenging the per-connection limits because I don't think that really buys us anything.

@vyzo
Copy link
Contributor Author

vyzo commented Dec 16, 2021

It does buy us resource control before the hanshake, which seems kind of an important edge case.

@Stebalien
Copy link
Member

Yeah, but that's not really connection scoped. That is, it's not something using resources from the connection scope, it's the connection using resources from some "DMZ" scope.

@vyzo
Copy link
Contributor Author

vyzo commented Dec 21, 2021

I have updated the pr, next iteration based on our sync discussion.

The most notable changes:

  • introduce service scopes, who can own streams.
  • made streams rooted at peers.
  • removed the Add* methods in favor of appropriate Set* methods, who do the resource checks and propagate internally.
  • Made the system and DMZ scopes explicit.

Limits

When reserving resources, limits are checked at every step going upwards.
At the root, we have the system scope where the global hard limits apply.

Under root, we have services, which are constructs that track resources at the application level; the programmer must tell us about resources used by services and add streams explicitly.

In parallel we have peers and protocol suites (the latter is useful so that the application programmer can get limits without doing any housekeeping himself).
Peers have limits, and similarly protocols have limit. The two exist in parallel, as scopes which limit resource usage at appropriate granularity.

The low level resource scopes are streams (belonging to a peer and optionally owned by a service) and connections. Streams are rooted in peers, as we discussed.

The DMZ

The DMZ scope constrains unassociated connections and unnegotiated streams and related resources; it allows us to limit transient resource usage.

Connection Lifetime

Outbound Connection

Creating a connection requires us to reserve the resource in the appropriate peer, with a limit check. Calling Done in the connection releases the resource.

Inbound Connection

Inbound connections get a scope, initially constrained by the DMZ; once the connection has been negotiated, the transport calls SetPeer which checks the peer's resource limit and then creates the appropriate association.

Stream Lifetime

Outbound Stream

Outbound streams are created through the peer scope, with the user supplied list of protocols. This checks and reserves resources both at the peer level and at the protocol scope level.

Inbound Stream

Inbound stream scopes are created through the peer scope without a protocol list.
This checks the limt at the peer level. In addition, such streams are also constrained by the DMZ.

Once the protocol has been negotiated, it is explicitly set with SetProtocol. This checks the limit at the protocol scope and releases the resource from the DMZ.

Service Owned Streams

The application may designate a stream as owned by a service; this allows us to apply per service limits.

@marten-seemann
Copy link
Contributor

This is implemented now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/discussion Topical discussion; usually not changes to codebase
Projects
None yet
Development

No branches or pull requests

5 participants