Skip to content

Commit

Permalink
Merge pull request #5196 from oasisprotocol/peternose/feature/master-…
Browse files Browse the repository at this point in the history
…keys-forward-secrecy

keymanager/src/runtime: Support master secret rotations
  • Loading branch information
peternose committed Jul 7, 2023
2 parents 346c0f8 + c069bb3 commit 1e8c200
Show file tree
Hide file tree
Showing 65 changed files with 4,364 additions and 1,391 deletions.
4 changes: 2 additions & 2 deletions .buildkite/code.pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@ steps:
- "build-rust-runtime-loader"
- "build-rust-runtimes"
branches: "!master !stable/*"
parallelism: 3
parallelism: 4
timeout_in_minutes: 20
command:
- .buildkite/scripts/download_e2e_test_artifacts.sh
Expand All @@ -227,7 +227,7 @@ steps:
- export CFLAGS_x86_64_fortanix_unknown_sgx="-isystem/usr/include/x86_64-linux-gnu -mlvi-hardening -mllvm -x86-experimental-lvi-inline-asm-hardening"
- export CC_x86_64_fortanix_unknown_sgx=clang-11
# Only run runtime scenarios as others do not use SGX.
- .buildkite/scripts/test_e2e.sh --scenario e2e/runtime/runtime-encryption --scenario e2e/runtime/trust-root/.+ --scenario e2e/runtime/keymanager-ephemeral-keys
- .buildkite/scripts/test_e2e.sh --scenario e2e/runtime/runtime-encryption --scenario e2e/runtime/trust-root/.+ --scenario e2e/runtime/keymanager-.+
artifact_paths:
- coverage-merged-e2e-*.txt
- /tmp/e2e/**/*.log
Expand Down
58 changes: 58 additions & 0 deletions .changelog/5196.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
keymanager/src/runtime: Support master secret rotations

Key managers now have the ability to rotate the master secret
at predetermined intervals. Each rotation introduces a new generation,
or version, of the master secret that is sequentially numbered, starting
from zero. These rotations occur during key manager status updates, which
typically happen during epoch transitions. To perform a rotation,
one of the key manager enclaves must publish a proposal for the next
generation of the master secret, which must then be replicated by
the majority of enclaves. If the replication process is not completed
by the end of the epoch, the proposal can be replaced with a new one.

The following metrics have been added:

- `oasis_worker_keymanager_consensus_ephemeral_secret_epoch_number`
is the epoch number of the latest ephemeral secret.

- `oasis_worker_keymanager_consensus_master_secret_generation_number`
is the generation number of the latest master secret.

- `oasis_worker_keymanager_consensus_master_secret_rotation_epoch_number`
is the epoch number of the latest master secret rotation.

- `oasis_worker_keymanager_consensus_master_secret_proposal_generation_number`
is the generation number of the latest master secret proposal.

- `oasis_worker_keymanager_consensus_master_secret_proposal_epoch_number`
is the epoch number of the latest master secret proposal.

- `oasis_worker_keymanager_enclave_ephemeral_secret_epoch_number`
is the epoch number of the latest ephemeral secret loaded into the enclave.

- `oasis_worker_keymanager_enclave_master_secret_generation_number`
is the generation number of the latest master secret as seen by the enclave.

- `oasis_worker_keymanager_enclave_master_secret_proposal_generation_number`
is the generation number of the latest master secret proposal loaded
into the enclave.

- `oasis_worker_keymanager_enclave_master_secret_proposal_epoch_number`
is the epoch number of the latest master secret proposal loaded
into the enclave.

- `oasis_worker_keymanager_enclave_generated_master_secret_generation_number`
is the generation number of the latest master secret generated
by the enclave.

- `oasis_worker_keymanager_enclave_generated_master_secret_epoch_number`
is the epoch number of the latest master secret generated by the enclave.

- `oasis_worker_keymanager_enclave_generated_ephemeral_secret_epoch_number`
is the epoch number of the latest ephemeral secret generated by the enclave.

The following metrics have had runtime labels added:

- `oasis_worker_keymanager_compute_runtime_count`,

- `oasis_worker_keymanager_policy_update_count`.
16 changes: 14 additions & 2 deletions docs/oasis-node/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,21 @@ oasis_worker_executor_liveness_live_ratio | Gauge | Ratio between live and total
oasis_worker_executor_liveness_live_rounds | Gauge | Number of live rounds in last epoch. | runtime | [worker/common/committee](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/common/committee/node.go)
oasis_worker_executor_liveness_total_rounds | Gauge | Number of total rounds in last epoch. | runtime | [worker/common/committee](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/common/committee/node.go)
oasis_worker_failed_round_count | Counter | Number of failed roothash rounds. | runtime | [worker/common/committee](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/common/committee/node.go)
oasis_worker_keymanager_compute_runtime_count | Counter | Number of compute runtimes using the key manager. | | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_compute_runtime_count | Counter | Number of compute runtimes using the key manager. | runtime | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_consensus_ephemeral_secret_epoch_number | Gauge | Epoch number of the latest ephemeral secret. | runtime | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_consensus_master_secret_generation_number | Gauge | Generation number of the latest master secret. | runtime | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_consensus_master_secret_proposal_epoch_number | Gauge | Epoch number of the latest master secret proposal. | runtime | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_consensus_master_secret_proposal_generation_number | Gauge | Generation number of the latest master secret proposal. | runtime | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_consensus_master_secret_rotation_epoch_number | Gauge | Epoch number of the latest master secret rotation. | runtime | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_enclave_ephemeral_secret_epoch_number | Gauge | Epoch number of the latest ephemeral secret loaded into the enclave. | runtime | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_enclave_generated_ephemeral_secret_epoch_number | Gauge | Epoch number of the latest ephemeral secret generated by the enclave. | runtime | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_enclave_generated_master_secret_epoch_number | Gauge | Epoch number of the latest master secret generated by the enclave. | runtime | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_enclave_generated_master_secret_generation_number | Gauge | Generation number of the latest master secret generated by the enclave. | runtime | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_enclave_master_secret_generation_number | Gauge | Generation number of the latest master secret as seen by the enclave. | runtime | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_enclave_master_secret_proposal_epoch_number | Gauge | Epoch number of the latest master secret proposal loaded into the enclave. | runtime | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_enclave_master_secret_proposal_generation_number | Gauge | Generation number of the latest master secret proposal loaded into the enclave. | runtime | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_enclave_rpc_count | Counter | Number of remote Enclave RPC requests via P2P. | method | [worker/keymanager/p2p](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/p2p/metrics.go)
oasis_worker_keymanager_policy_update_count | Counter | Number of key manager policy updates. | | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_keymanager_policy_update_count | Counter | Number of key manager policy updates. | runtime | [worker/keymanager](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/keymanager/metrics.go)
oasis_worker_node_registered | Gauge | Is oasis node registered (binary). | | [worker/registration](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/registration/worker.go)
oasis_worker_node_registration_eligible | Gauge | Is oasis node eligible for registration (binary). | | [worker/registration](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/registration/worker.go)
oasis_worker_node_status_frozen | Gauge | Is oasis node frozen (binary). | | [worker/registration](https://github.com/oasisprotocol/oasis-core/tree/master/go/worker/registration/worker.go)
Expand Down
95 changes: 77 additions & 18 deletions go/consensus/cometbft/apps/keymanager/keymanager.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (

beacon "github.com/oasisprotocol/oasis-core/go/beacon/api"
"github.com/oasisprotocol/oasis-core/go/common/cbor"
"github.com/oasisprotocol/oasis-core/go/common/crypto/signature"
"github.com/oasisprotocol/oasis-core/go/common/node"
"github.com/oasisprotocol/oasis-core/go/consensus/api/transaction"
tmapi "github.com/oasisprotocol/oasis-core/go/consensus/cometbft/api"
Expand All @@ -21,8 +22,9 @@ import (
registry "github.com/oasisprotocol/oasis-core/go/registry/api"
)

// maxEphemeralSecretAge is the maximum age of an ephemeral secret in the number of epochs.
const maxEphemeralSecretAge = 20
// minProposalReplicationPercent is the minimum percentage of enclaves in the key manager committee
// that must replicate the proposal for the next master secret before it is accepted.
const minProposalReplicationPercent = 66

var emptyHashSha3 = sha3.Sum256(nil)

Expand Down Expand Up @@ -79,6 +81,12 @@ func (app *keymanagerApplication) ExecuteTx(ctx *tmapi.Context, tx *transaction.
return api.ErrInvalidArgument
}
return app.updatePolicy(ctx, state, &sigPol)
case api.MethodPublishMasterSecret:
var sigSec api.SignedEncryptedMasterSecret
if err := cbor.Unmarshal(tx.Body, &sigSec); err != nil {
return api.ErrInvalidArgument
}
return app.publishMasterSecret(ctx, state, &sigSec)
case api.MethodPublishEphemeralSecret:
var sigSec api.SignedEncryptedEphemeralSecret
if err := cbor.Unmarshal(tx.Body, &sigSec); err != nil {
Expand Down Expand Up @@ -173,12 +181,23 @@ func (app *keymanagerApplication) onEpochChange(ctx *tmapi.Context, epoch beacon
return fmt.Errorf("failed to query key manager status: %w", err)
}

newStatus := app.generateStatus(ctx, rt, oldStatus, nodes, params, epoch)
secret, err := state.MasterSecret(ctx, rt.ID)
if err != nil && err != api.ErrNoSuchMasterSecret {
ctx.Logger().Error("failed to query key manager master secret",
"id", rt.ID,
"err", err,
)
return fmt.Errorf("failed to query key manager master secret: %w", err)
}

newStatus := app.generateStatus(ctx, rt, oldStatus, secret, nodes, params, epoch)
if forceEmit || !bytes.Equal(cbor.Marshal(oldStatus), cbor.Marshal(newStatus)) {
ctx.Logger().Debug("status updated",
"id", newStatus.ID,
"is_initialized", newStatus.IsInitialized,
"is_secure", newStatus.IsSecure,
"generation", newStatus.Generation,
"rotation_epoch", newStatus.RotationEpoch,
"checksum", hex.EncodeToString(newStatus.Checksum),
"rsk", newStatus.RSK,
"nodes", newStatus.Nodes,
Expand All @@ -190,15 +209,6 @@ func (app *keymanagerApplication) onEpochChange(ctx *tmapi.Context, epoch beacon
}
toEmit = append(toEmit, newStatus)
}

// Clean ephemeral secrets.
// TODO: use max ephemeral secret age from the key manager policy
if epoch > maxEphemeralSecretAge {
expiryEpoch := epoch - maxEphemeralSecretAge
if err = state.CleanEphemeralSecrets(ctx, rt.ID, expiryEpoch); err != nil {
return fmt.Errorf("failed to clean ephemeral secrets: %w", err)
}
}
}

// Note: It may be a good idea to sweep statuses that don't have runtimes,
Expand All @@ -214,10 +224,11 @@ func (app *keymanagerApplication) onEpochChange(ctx *tmapi.Context, epoch beacon
return nil
}

func (app *keymanagerApplication) generateStatus(
func (app *keymanagerApplication) generateStatus( // nolint: gocyclo
ctx *tmapi.Context,
kmrt *registry.Runtime,
oldStatus *api.Status,
secret *api.SignedEncryptedMasterSecret,
nodes []*node.Node,
params *registry.ConsensusParameters,
epoch beacon.EpochTime,
Expand All @@ -226,10 +237,25 @@ func (app *keymanagerApplication) generateStatus(
ID: kmrt.ID,
IsInitialized: oldStatus.IsInitialized,
IsSecure: oldStatus.IsSecure,
Generation: oldStatus.Generation,
RotationEpoch: oldStatus.RotationEpoch,
Checksum: oldStatus.Checksum,
Policy: oldStatus.Policy,
}

// Data needed to count the nodes that have replicated the proposal for the next master secret.
var (
nextGeneration uint64
nextChecksum []byte
nextRSK *signature.PublicKey
updatedNodes []signature.PublicKey
)
nextGeneration = status.NextGeneration()
if secret != nil && secret.Secret.Generation == nextGeneration && secret.Secret.Epoch == epoch {
nextChecksum = secret.Secret.Secret.Checksum
}

// Compute the policy hash to reject nodes that are not up-to-date.
var rawPolicy []byte
if status.Policy != nil {
rawPolicy = cbor.Marshal(status.Policy)
Expand All @@ -251,10 +277,11 @@ nextNode:
continue
}

secretReplicated := true
isInitialized := status.IsInitialized
isSecure := status.IsSecure
checksum := status.Checksum
RSK := status.RSK
nRSK := nextRSK

var numVersions int
for _, nodeRt := range n.Runtimes {
Expand Down Expand Up @@ -306,15 +333,20 @@ nextNode:
// The first version gets to be the source of truth.
isInitialized = true
isSecure = initResponse.IsSecure
checksum = initResponse.Checksum
}

// Skip nodes with mismatched status fields.
if initResponse.IsSecure != isSecure {
ctx.Logger().Error("Security status mismatch for runtime", vars...)
continue nextNode
}
if !bytes.Equal(initResponse.Checksum, checksum) {

// Skip nodes with mismatched checksum.
// Note that a node needs to register with an empty checksum if no master secrets
// have been generated so far. Otherwise, if secrets have been generated, the node
// needs to register with a checksum computed over all the secrets generated so far
// since the key manager's checksum is updated after every master secret rotation.
if !bytes.Equal(initResponse.Checksum, status.Checksum) {
ctx.Logger().Error("Checksum mismatch for runtime", vars...)
continue nextNode
}
Expand All @@ -332,6 +364,18 @@ nextNode:
continue nextNode
}

// Check if all versions have replicated the last master secret,
// derived the same RSK and are ready to move to the next generation.
if !bytes.Equal(initResponse.NextChecksum, nextChecksum) {
secretReplicated = false
}
if nRSK == nil {
nRSK = initResponse.NextRSK
}
if initResponse.NextRSK != nil && !initResponse.NextRSK.Equal(*nRSK) {
secretReplicated = false
}

numVersions++
}

Expand All @@ -341,19 +385,34 @@ nextNode:
if !isInitialized {
panic("the key manager must be initialized")
}
if secretReplicated {
nextRSK = nRSK
updatedNodes = append(updatedNodes, n.ID)
}

// If the key manager is not initialized, the first verified node gets to be the source
// of truth, every other node will sync off it.
if !status.IsInitialized {
status.IsInitialized = true
status.IsSecure = isSecure
status.Checksum = checksum
}
status.RSK = RSK

status.Nodes = append(status.Nodes, n.ID)
}

// Accept the proposal if the majority of the nodes have replicated
// the proposal for the next master secret.
if numNodes := len(status.Nodes); numNodes > 0 && nextChecksum != nil {
percent := len(updatedNodes) * 100 / numNodes
if percent >= minProposalReplicationPercent {
status.Generation = nextGeneration
status.RotationEpoch = epoch
status.Checksum = nextChecksum
status.RSK = nextRSK
status.Nodes = updatedNodes
}
}

return status
}

Expand Down
Loading

0 comments on commit 1e8c200

Please sign in to comment.