Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Changes to Store protocol APIs for Autosharding #786

Closed
Tracked by #65
chaitanyaprem opened this issue Sep 29, 2023 · 19 comments
Closed
Tracked by #65

feat: Changes to Store protocol APIs for Autosharding #786

chaitanyaprem opened this issue Sep 29, 2023 · 19 comments
Assignees
Labels
E:1.2: Autosharding for autoscaling See https://github.com/waku-org/pm/issues/65 for details

Comments

@chaitanyaprem
Copy link
Collaborator

Feature

Store API to be updated to take only contentTopics and derive pubSubTopic based on autosharding.

Probably reuse contentFilter which is extracted from Filter protocol for the same.

TBD - More analysis to be done to see what other changes are required apart from StoreClient API.

@SionoiS , Doesn't store API also require change to work with only contentTopics for autosharding? Did not see this being tracked.

@chaitanyaprem chaitanyaprem added the E:1.2: Autosharding for autoscaling See https://github.com/waku-org/pm/issues/65 for details label Sep 29, 2023
@chaitanyaprem chaitanyaprem self-assigned this Sep 29, 2023
@SionoiS
Copy link

SionoiS commented Sep 29, 2023

The protocol doesn't have to change because it already allow content topics without pubsub topics. I did not change anything in nwaku for STORE yet, but the implementation could be improved.

@chaitanyaprem
Copy link
Collaborator Author

chaitanyaprem commented Sep 29, 2023

The protocol doesn't have to change because it already allow content topics without pubsub topics. I did not change anything in nwaku for STORE yet, but the implementation could be improved.

Ok, was not aware of that.
I meant the Store-Client side of the code. Wouldn't there be changes required to do peerSelection based on contentTopic alone i.e using autosharding to derive the pubSubTopic?

And also on the StoreNode side, wouldn't we need to derive the pubSubTopic for the contentTopic using autosharding (if pubSubTopic is not specified) to retrieve messages? Looks like there will be some work related to Store and Autosharding.

@SionoiS
Copy link

SionoiS commented Sep 29, 2023

Wouldn't there be changes required to do peerSelection based on contentTopic alone i.e using autosharding to derive the pubSubTopic?

Yes that part I'm working on right now. waku-org/nwaku#2092 but that just about finding a Store peer using discv5. Otherwise use the peer that come with the request.

And also on the StoreNode side, wouldn't we need to derive the pubSubTopic for the contentTopic using autosharding (if pubSubTopic is not specified) to retrieve messages? Looks like there will be some work related to Store and Autosharding.

If the request don't specify pubsub topic we can ignore sharding and only filter messages by content topic.

The only problem is when the request has both pubsub and content topics. Now that's complicated. All content topic could fall on the same shard, each have their own shard or a mix. May require protocol changes, I haven't dived deep yet.

@chaitanyaprem
Copy link
Collaborator Author

chaitanyaprem commented Oct 12, 2023

The only problem is when the request has both pubsub and content topics. Now that's complicated. All content topic could fall on the same shard, each have their own shard or a mix. May require protocol changes, I haven't dived deep yet.

Hmm, To simplify this...maybe we can take an approach similar to what was done for Filter to handle autosharding.
On Store-client side, split them to multiple requests i.e 1 per pubSubTopic and fire them towards StoreNode.

@chaitanyaprem chaitanyaprem changed the title feat: Changes to Store protocol for Autosharding feat: Changes to Store protocol APIs for Autosharding Nov 6, 2023
@chaitanyaprem
Copy link
Collaborator Author

chaitanyaprem commented Nov 7, 2023

The protocol doesn't have to change because it already allow content topics without pubsub topics. I did not change anything in nwaku for STORE yet, but the implementation could be improved.

@SionoiS , I was going through Store rfc and noticed that pubsubTopic is mandatory in store queries ref:https://rfc.vac.dev/spec/13/#historyquery.
Also looks like store-node indexes data with a digest which is also based on a set of fields including pubsubTopic.
Is there some update missing in the rfc?

If so, this makes it simpler right. Only on store-client side we can translate contentTopics to pubsubTopics and generated Store-Queries towards Store node. No changes would be required on Store node then.

@chaitanyaprem
Copy link
Collaborator Author

chaitanyaprem commented Nov 7, 2023

The protocol doesn't have to change because it already allow content topics without pubsub topics. I did not change anything in nwaku for STORE yet, but the implementation could be improved.

@SionoiS , I was going through Store rfc and noticed that pubsubTopic is mandatory in store queries ref:https://rfc.vac.dev/spec/13/#historyquery. Also looks like store-node indexes data with a digest which is also based on a set of fields including pubsubTopic. Is there some update missing in the rfc?

If so, this makes it simpler right. Only on store-client side we can translate contentTopics to pubsubTopics and generated Store-Queries towards Store node. No changes would be required on Store node then.

On going through Store code in go-waku and the rfc again , it looks like pubsubTopic can be empty.
So, you can ignore the above.

@chaitanyaprem
Copy link
Collaborator Author

chaitanyaprem commented Nov 8, 2023

@SionoiS Have been thinking about the changes to Store futher wrt sharding and have following queries.

  • Peer selection can be done if pubsubTopic is specified, but the Store query doesn't have a pubSubTopic for static sharding. In which case how are we supposed to select the StoreNode? Is it always expected then caller of Store client API provides this?
  • Maybe the Store client API needs to be updated to either take only contentTopics(for autosharding) and pubsub and contentTopic for static-sharding. WDYT?

@SionoiS
Copy link

SionoiS commented Nov 8, 2023

  • Peer selection can be done if pubsubTopic is specified, but the Store query doesn't have a pubSubTopic for static sharding. In which case how are we supposed to select the StoreNode? Is it always expected then caller of Store client API provides this?

If there's no pubsub topic in the store query, the pubsub topic doesn't matter. You can disregard sharding. https://rfc.vac.dev/spec/13/#historyquery

As for the peer selection, either the caller provide the peer, the client node has a service peer set OR discv5 is used to find one.

  • Maybe the Store client API needs to be updated to either take only contentTopics(for autosharding) and pubsub and contentTopic for static-sharding. WDYT?

The store client should not know about sharding at all. It should only take a query and a peer. Higher in the stack sharding is handled.

Information Flow: rest -> sharding -> peer selection -> client -> service

@chaitanyaprem
Copy link
Collaborator Author

discv5 is used to find one.

How can we find if we don't know the shard?
We will need specific store peer that supports a particular shard, which means pubsubTopic becomes mandatory if peer is not provided by app.

@SionoiS
Copy link

SionoiS commented Nov 8, 2023

How can we find if we don't know the shard? We will need specific store peer that supports a particular shard, which means pubsubTopic becomes mandatory if peer is not provided by app.

Discv5 always returns random peers. We then filter and try again if needed. We could just use the first Store peer found regardless of shard.

@chaitanyaprem
Copy link
Collaborator Author

How can we find if we don't know the shard? We will need specific store peer that supports a particular shard, which means pubsubTopic becomes mandatory if peer is not provided by app.

Discv5 always returns random peers. We then filter and try again if needed. We could just use the first Store peer found regardless of shard.

Yeah, thought of that..but in that case does the store peer respond that it doesn't support that shard? Otherwise the client doesn't know if there are no messages or to contact another node.

@SionoiS
Copy link

SionoiS commented Nov 8, 2023

Yeah, thought of that..but in that case does the store peer respond that it doesn't support that shard? Otherwise the client doesn't know if there are no messages or to contact another node.

🤔 We could populate the pubsub topic field of the query before trying to find a store peer on that shard?

@chaitanyaprem
Copy link
Collaborator Author

Yeah, thought of that..but in that case does the store peer respond that it doesn't support that shard? Otherwise the client doesn't know if there are no messages or to contact another node.

🤔 We could populate the pubsub topic field of the query before trying to find a store peer on that shard?

Yup, which brings back my original point as to should we allow queries without pubsubTopic/shardInfo?

I propose following changes to handle such scenarios:

  • Store API(REST or library) should allow either contentTopic only(in case of autosharding) or contentTopic and pubsubTopic/shardInfo (in case of static sharding). This takes care of discovery and all sharding scenarios.
  • Update Store protocol to make pubsubTopic/shardInfo mandatory. Store node can then just reply with error in case it doesn't handle that pubsubTopic.

@chaitanyaprem
Copy link
Collaborator Author

chaitanyaprem commented Nov 9, 2023

The only problem is when the request has both pubsub and content topics. Now that's complicated. All content topic could fall on the same shard, each have their own shard or a mix. May require protocol changes, I haven't dived deep yet.

Hmm, To simplify this...maybe we can take an approach similar to what was done for Filter to handle autosharding. On Store-client side, split them to multiple requests i.e 1 per pubSubTopic and fire them towards StoreNode.

Started implementing this in #885 and noticed that each Query is responded with a cursor, which means splitting requests like this may not be a good idea.
Or we have to return results grouped separately with their own cursors which doesn't look like a good API experience.
@SionoiS @richard-ramos Any thoughts on how to handle this?

@SionoiS
Copy link

SionoiS commented Nov 9, 2023

🤔 We could populate the pubsub topic field of the query before trying to find a store peer on that shard?

Yup, which brings back my original point as to should we allow queries without pubsubTopic/shardInfo?

I totally missed your original point. Reread my answers and IDK what I was saying sorry about that...

I see 4 cases here;

  • no pubsub no content topic -> return ALL the messages
    • user didn't specify params so whatever! Use any Store peer.
  • no pubsub, one or many content topic -> return the messages and autoshard if needed.
    • maybe find a Store peer that support all the shards?
  • pubsub, no content topic -> return the messages from that topic
    • find a Store peer on that topic
  • pubsub AND content topic -> return the messages filtered by content topic
    • return an error if the pubsub topic and autosharding mismatch?

@chaitanyaprem
Copy link
Collaborator Author

chaitanyaprem commented Nov 10, 2023

🤔 We could populate the pubsub topic field of the query before trying to find a store peer on that shard?

Yup, which brings back my original point as to should we allow queries without pubsubTopic/shardInfo?

I totally missed your original point. Reread my answers and IDK what I was saying sorry about that...

I see 4 cases here;

* no pubsub no content topic -> return ALL the messages
  
  * user didn't specify params so whatever! Use any Store peer.

* no pubsub, one or many content topic -> return the messages and autoshard if needed.
  
  * maybe find a Store peer that support all the shards?

* pubsub, no content topic -> return the messages from that topic
  
  * find a Store peer on that topic

* pubsub AND content topic -> return the messages filtered by content topic
  
  * return an error if the pubsub topic and autosharding mismatch?
  • I am wondering if we need to support no pubsub and no contentTopic. Wouldn't that make the query a very large one? If it just for node sync, then maybe we should have a separate protocol for that. Or node-sync should also specify pubsubTopics the node is interested in. Just like Filter protocol, we can only allow a subset to be specified rather than all wildcards.
  • For case-2, pubsubTopics have to be derived from autosharding. Would be good if we can find a Store that supports all shards. If we can't then we can return error for now (think of how to deal with this later). Anyways this won't be an issue for Gen-0 as for the first version of TWN we are mandating all nodes to support all shards.
  • pubsubTopic and no contentTopic is straightforward with no issues
  • pubsub and contentTopics : Instead of trying to derive pubsubTopics, In this case we can assume that if both are provided then it is either static/named sharding, and just send the query.

One more question that comes to me is how do we support default-pubsubTopic then? Maybe we will have to ask users to explicitly specify it.

@chaitanyaprem
Copy link
Collaborator Author

🤔 We could populate the pubsub topic field of the query before trying to find a store peer on that shard?

Yup, which brings back my original point as to should we allow queries without pubsubTopic/shardInfo?

I totally missed your original point. Reread my answers and IDK what I was saying sorry about that...
I see 4 cases here;

* no pubsub no content topic -> return ALL the messages
  
  * user didn't specify params so whatever! Use any Store peer.

* no pubsub, one or many content topic -> return the messages and autoshard if needed.
  
  * maybe find a Store peer that support all the shards?

* pubsub, no content topic -> return the messages from that topic
  
  * find a Store peer on that topic

* pubsub AND content topic -> return the messages filtered by content topic
  
  * return an error if the pubsub topic and autosharding mismatch?
* I am wondering if we need to support no pubsub and no contentTopic. Wouldn't that make the query a very large one? If it just for node sync, then maybe we should have a separate protocol for that. Or node-sync should also specify pubsubTopics the node is interested in. Just like Filter protocol, we can only allow a subset to be specified rather than all wildcards.

* For case-2, pubsubTopics have to be derived from autosharding. Would be good if we can find a Store that supports all shards. If we can't then we can return error for now (think of how to deal with this later).  Anyways this won't be an issue for Gen-0 as for the first version of TWN we are mandating all nodes to support all shards.

* pubsubTopic and no contentTopic is straightforward with no issues

* pubsub and contentTopics : Instead of trying to derive pubsubTopics, In this case we can assume that if both are provided then it is either static/named sharding, and just send the query.

One more question that comes to me is how do we support default-pubsubTopic then? Maybe we will have to ask users to explicitly specify it.

@richard-ramos , How does status use store protocol? Does it always specify store node to be used even with static sharding or is it expected to be discovered?
Because if i am part of different communities, then it is possible that store nodes may be different for each community. If status-go makes a store query per community, then we don't have to worry about solving issue mentinoed in case-2 above. Otherwise we may have to come up with some additinal API to handle this. Ideally, i am guessing this would be part of SDK scope.

@SionoiS
Copy link

SionoiS commented Nov 10, 2023

  • I am wondering if we need to support no pubsub and no contentTopic. Wouldn't that make the query a very large one? If it just for node sync, then maybe we should have a separate protocol for that. Or node-sync should also specify pubsubTopics the node is interested in. Just like Filter protocol, we can only allow a subset to be specified rather than all wildcards.

Feels a bit arbitrary to remove this use case but 🤷

  • pubsub and contentTopics : Instead of trying to derive pubsubTopics, In this case we can assume that if both are provided then it is either static/named sharding, and just send the query.

Yes, let's not try to be smarter than the user. If the user meant to use autosharding they should just use content topics anyway.

One more question that comes to me is how do we support default-pubsubTopic then? Maybe we will have to ask users to explicitly specify it.

I don't think it needs special treatment, it's just a topic like any other. The user would do; pubsub (default topic) and content for filtering.

@chaitanyaprem
Copy link
Collaborator Author

Weekly Update

  • achieved: API updates for autosharding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E:1.2: Autosharding for autoscaling See https://github.com/waku-org/pm/issues/65 for details
Projects
Archived in project
Development

No branches or pull requests

2 participants