Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graphsync Simplication Idea: one libp2p stream per request, no message queues (except for retries) #259

Open
hannahhoward opened this issue Oct 26, 2021 · 2 comments
Labels
effort/weeks Estimated to take multiple weeks kind/discussion Topical discussion; usually not changes to codebase

Comments

@hannahhoward
Copy link
Collaborator

What

The message protocol format of go-graphsync combines multiple-requests and responses into a single message on a single stream.

Perhaps it is time to reexamine the protocol and the rationale for combining request and responses, and having blocks live as a seperate element from responses.

Why was it this way

  1. Graphsync was modelled after Bitswap, which combines wantlists, responses, and blocks in the same message
  2. We assumed block deduplication was a chief concern. The thought was, if two requests are executing, and they both encounter the same block, they shouldn't send it twice. They shouldn't send it twice in the same message, and, more nebulously, they shouldn't set it twice if they are executing "at the same time"

Design rationale in practice

In filecoin, data is almost always streamed to different blockstores (CAR files). This means that ultimately, we almost never need to deduplicate across requests. In fact, we have a whole extension to graphsync just to prevent this.

Consequences

Package multiple requests and responses into a single message stream is the source of much go routine complexity — as you execute a response in one thread, you have to send the data over to a per peer message queue thread that is packaging up responses to go over the network. Meanwhile, on the other side, matching up responses and blocks with their associated requests is more complex than it needs to be.

Moreover, cause message sending is one queue per peer, our allocator backpressure system is per peer as well -- when it might make more sense for it to be per request.

The need to handle multiple requests and responses and match them up to to blocks has been an ongoing source of difficulty for/complaints from implementors of graphsync in other languages.

Why leave it as it is

If remains to be seen if multiple request/response per message and associated deduplication is more important for the IPFS use case, where data tends to land in a single blockstore.

@hannahhoward hannahhoward added the need/triage Needs initial labeling and prioritization label Oct 26, 2021
@hannahhoward hannahhoward changed the title Graphsync Simplication Idea 1: one libp2p stream per request, no message queues (except for retries) Graphsync Simplication Idea: one libp2p stream per request, no message queues (except for retries) Oct 26, 2021
@hannahhoward hannahhoward added effort/weeks Estimated to take multiple weeks kind/discussion Topical discussion; usually not changes to codebase and removed need/triage Needs initial labeling and prioritization labels Oct 26, 2021
@Stebalien
Copy link
Member

Honestly, I just assumed this was how it worked. That is, I assumed you'd:

  1. Open a new stream.
  2. Send a (one) graphsync request.
  3. Get a stream of blocks back on that same stream.

@Stebalien
Copy link
Member

Why leave it as it is

Backpressure?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/weeks Estimated to take multiple weeks kind/discussion Topical discussion; usually not changes to codebase
Projects
None yet
Development

No branches or pull requests

2 participants