-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(drand): StateGetBeaconEntry
uses chain beacons for historical epochs
#12428
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the PR title to match https://github.com/filecoin-project/lotus/blob/master/CONTRIBUTING.md#pr-title-conventions
StateGetBeaconEntry
uses chain beacons for historical epochs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the PR title to match https://github.com/filecoin-project/lotus/blob/master/CONTRIBUTING.md#pr-title-conventions
b40852d
to
8c1d45f
Compare
I added a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, this should significantly reduce the quantity of nodes trying to query the old defunct incentinet network we lost 👍🏻
@@ -647,6 +647,11 @@ func (fr *fakeRand) GetChainRandomness(ctx context.Context, randEpoch abi.ChainE | |||
return *(*[32]byte)(out), nil | |||
} | |||
|
|||
func (fr *fakeRand) GetBeaconEntry(ctx context.Context, randEpoch abi.ChainEpoch) (*types.BeaconEntry, error) { | |||
r, _ := fr.GetChainRandomness(ctx, randEpoch) | |||
return &types.BeaconEntry{Round: 10, Data: r[:]}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that the fakeRand
returning always the same round value, 10, rather than the expected one for the given randEpoch
isn't breaking anything.
for i := 0; i < 20; i++ { | ||
cbe := randTs.Blocks()[0].BeaconEntries | ||
for _, v := range cbe { | ||
if v.Round == round { | ||
return &v, nil | ||
} | ||
} | ||
|
||
next, err := sr.cs.LoadTipSet(ctx, randTs.Parents()) | ||
if err != nil { | ||
return nil, xerrors.Errorf("failed to load parents when searching back for beacon entry: %w", err) | ||
} | ||
|
||
randTs = next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the need for this loop here is because they might be null rounds?
Why 20? Is there any protocol guarantee that there won't be more than 20 null ones?
Furthermore I thought a given tipset was a set of blocks with the same "epoch"?
The drand beacons for a given epoch are univocally determined by the epoch timestamp.
But then loading the tipset's parents and trying to fetch their beacon entries should yield a beacon meant for the previous epoch, no? e.g. if my epoch is 42 and the corresponding drand epoch is 66 and no beacon entries match the round I've queried, it doesn't make sense to go see at the past 20 epochs since these epochs should have beacon rounds that are strictly inferiors to 66.
Let say I have tipsets for epochs:
40 -> 41 -> 42 ->43
and let's say the corresponding drand rounds would be:
64 -> 65 -> 66 -> 67
It seems to me that querying getBeaconEntryV3
on epoch 43 might go and look at these past epochs to check if they contains the right drand round 67 in case the first block of the tipset 43 doesn't contains round 67 in its beacons entries.
But I don't understand how a block on epoch 40 could have drand round 67 in it, since round 67 shouldn't have been produced yet by the drand network when epoch 40 occurred, it only could have known the rounds up to round 64 at the time, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is old code you moved around, but it still looks weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is weird, and I was tempted to rip out the loop and return an error if it's not found in the next non-null epoch. It came in #7376 and I suspect it was just a rough copy of the logic in GetLatestBeaconEntry
which searches 20 tipsets for one with a beacon in it and fails after it goes through 20 without finding a single beacon.
With the current logic, even that search should fail, but historically maybe that was not unreasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the interests of scope, and the fact that this check probably never iterates more than once, I decided to leave it as is and cleanup can be done later so as not to complicated this PR and get it over the line
wait: true, | ||
}, | ||
{ | ||
// After v14, if the tipset corresponding to round X is null, we still fetch the randomness for X (from the next non-null tipset) but can get the exact round |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next or the previous?
It seems from the above code you're getting it from its parent, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, that's the 20 iterations thing that needs cleaning up:
- call
getBeaconEntryV3
with the filecoin epoch we want - call
GetBeaconRandomnessTipset
on that epoch withlookback
set tofalse
, so it'll skip ahead to the next non-null tipset - look in the beacon entries in that tipset for the precise drand round number that we want
- if we don't find it, walk back up the chain ... but we're not going to find it up the chain because we're matching the exact drand round number so it'll always end in an error, just after a 20 tipset walkback
Opened #12452 to capture some thoughts about the 20 tipset walkback(s) in the code that I think are unnecessary. I just don't want to adjust the current get-beacon-from-chain logic in this PR so I left it alone. |
8c1d45f
to
b5c951a
Compare
Calling for objections before I land this. It would be great to get some more review eyes but I understand it's niche. |
…pochs Fixes: #12414 Previously StateGetBeaconEntry would always try and use a drand beacon to get the appropriate round. But as drand has shut down old beacons and we've removed client details from Lotus, it has stopped working for historical beacons. This fix restores historical beacon entries by using the on-chain lookup, however it now follows the rules used by StateGetRandomnessFromBeacon and the get_beacon_randomness syscall which has some quirks with null rounds prior to nv14. See #12414 (comment) for specifics. StateGetBeaconEntry still blocks for future epochs and uses live drand beacon clients to wait for and fetch rounds as they are available.
b5c951a
to
92818c1
Compare
Per 2024-09-17 FilOz meeting, @jennijuju asked that @Stebalien or @Kubuxu take this to get this landed. @Stebalien @Kubuxu : see @rvagg 's offer for a sync discussion in https://filecoinproject.slack.com/archives/CP50PPW2X/p1726550218788619 if that is helpful Also, this one is needed for #12464 which we want to start this week. |
Notes from 2024-09-18 Lotus standup conversation: even though this has been opened for a couple of weeks, it affects pre-quicknet, and it was desired to get in as part of Lotus v1.29.2 this week, we won't merge it until it gets reviewed properly. We know @Stebalien and @Kubuxu are more tied up this week with a colo, but we assume (and we'll followup) that they'll engage next week. |
I will try to fetch in the context. I looked at it few weeks ago but I could not reach a conclusion at that time. |
Fixes: #12414
Previously StateGetBeaconEntry would always try and use a drand beacon to get the appropriate round. But as drand has shut down old beacons and we've removed client details from Lotus, it has stopped working for historical beacons.
This fix restores historical beacon entries by using the on-chain lookup, however it now follows the rules used by StateGetRandomnessFromBeacon and the get_beacon_randomness syscall which has some quirks with null rounds prior to nv14. See #12414 (comment) for specifics.
StateGetBeaconEntry still blocks for future epochs and uses live drand beacon clients to wait for and fetch rounds as they are available.
Notes for review: the existing logic in
chain/rand/rand.go
should not change but I've had to do some refactoring to be able to return the beacon entries as well and not just the randomness. So the file looks a bit different. Please pay special attention to that because it's used for theget_beacon_randomness
syscall and theStateGetRandomnessFromBeacon
API. Aside from that, this should only impactStateGetBeaconEntry
.