Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tunneled routing support for boundary services. #381

Merged
merged 1 commit into from
Jan 23, 2024

Conversation

rcgoodfellow
Copy link
Contributor

@rcgoodfellow rcgoodfellow commented Jun 27, 2023

Introduces tunnel routing as described in RFD 404. The primary changes here are the following.

  • Removes the assumption of a single boundary services endpoint address.
  • Adds a virt-to-boundary table (V2B) that maps destination prefixes on a VPC to a boundary services address on the physical/underlay network.
  • Implements a longest-prefix-match routing table for V2B using a poptrie

Fixes

@rcgoodfellow rcgoodfellow force-pushed the boundary-services-overlay-flow-pinning branch from d0a9cf6 to 9c83e09 Compare June 30, 2023 07:56
@rcgoodfellow rcgoodfellow changed the title Pin boundary services overlay flows to underlay path Tunneled routing support for boundary services. Jun 30, 2023
@rcgoodfellow rcgoodfellow force-pushed the boundary-services-overlay-flow-pinning branch 3 times, most recently from 45e9db7 to e0cf827 Compare August 4, 2023 17:23
@rcgoodfellow rcgoodfellow force-pushed the boundary-services-overlay-flow-pinning branch 2 times, most recently from 05a0042 to fe70f18 Compare August 9, 2023 22:20
@rcgoodfellow rcgoodfellow marked this pull request as ready for review August 9, 2023 22:52
Copy link
Collaborator

@FelixMcFelix FelixMcFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A handful of nits/potential oversights, otherwise I think it's clear and looks to be a reasonable approach to store and use V2B mappings.

Comment on lines 202 to 210
#[repr(C)]
pub enum krw_type_t {
RW_DRIVER = 2,
RW_DEFAULT = 4,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is a better construction than a newtype struct as before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this probably wouldn't be an issue for this specific type but the newtype construction is to avoid an unfortunate footgun with matching on repr C enums, see #322

crates/opte-api/src/cmd.rs Outdated Show resolved Hide resolved
crates/opte-api/src/ip.rs Outdated Show resolved Hide resolved
bin/opteadm/src/bin/opteadm.rs Outdated Show resolved Hide resolved
RouterTargetInternal::InternetGateway => {
match self.v2b.get(&flow_id.dst_ip) {
Some(phys) => PhysNet {
ether: MacAddr::ZERO,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding/confirmation, this is because of the move to an anycast gateway IP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been reworked and this code no longer exists. Instead of doing anycast at the gateways, we're doing unicast and handling multipath decisions in OPTE based on flow hashing.

lib/oxide-vpc/src/engine/print.rs Outdated Show resolved Hide resolved
lib/oxide-vpc/tests/common/mod.rs Outdated Show resolved Hide resolved
Comment on lines 202 to 210
#[repr(C)]
pub enum krw_type_t {
RW_DRIVER = 2,
RW_DEFAULT = 4,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this probably wouldn't be an issue for this specific type but the newtype construction is to avoid an unfortunate footgun with matching on repr C enums, see #322

lib/opte/src/ddi/sync.rs Outdated Show resolved Hide resolved
lib/oxide-vpc/src/engine/overlay.rs Outdated Show resolved Hide resolved
#[repr(C)]
#[derive(Debug, Deserialize, Serialize)]
pub struct DumpVirt2BoundaryReq {
pub unused: u64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did the ioctl machinery not like this zero-sized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the case. I'm following suit with what's in DumpVirt2PhysReq.

Some(phys) => PhysNet {
ether: MacAddr::ZERO,
ip: phys.ip,
vni: Vni::new(BOUNDARY_SERVICES_VNI).unwrap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would phys.vni be different than BOUNDARY_SERVICES_VNI? And speaking of which, do we even need a fixed BOUNDARY_SERVICES_VNI in opte anymore with the v2b map (besides tests)? I imagine the contract is now between the p4 and sled-agent who'll be setting these mappings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the latter part, we're still using 99 for boundary services egress traffic - even though the tunnel endpoint addresses are now dynamic and determined through advertisements.

Comment on lines 627 to 626
let e = self.ip4.lock().insert(ip4, tep);
self.update_poptrie_v4();
e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self.ip4 lock afaik will get dropped at the end of that first statement and so there's a chance of a race before update_poptrie_v4 tries to grab the lock again. In fact, by the time we're returning the inserted TunnelEndpoint e, it might've already been removed.

Perhaps just hold both structures (ip4,pt4/ip6,pt6) behind the same RwLock. Or at the very least perhaps make update_poptrie_v{4,6} take a MutexGuard

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Done.

@@ -95,6 +97,12 @@ pub const VPC_LAYERS: [&str; 5] =
pub const GW_MAC_ADDR: MacAddr =
MacAddr::from_const([0xA8, 0x40, 0x25, 0xFF, 0x77, 0x77]);

pub const BS_MAC_ADDR: MacAddr =
MacAddr::from_const([0xA8, 0x40, 0x25, 0x77, 0x77, 0x77]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This falls into the physical devices space of our MAC address space. We also have some further constraints on the software-defined/virtual space noted in omicron.

This is just for tests but FF:00:00-FF:7F:FF would be where we can draw from for fixed addresses (i.e. GW_MAC_ADDR).

I also realize now that the boundary services mac addr we're using also conflicts :/ It falls into the guest addresses range (F0:00:00-FE:FF:FF) so an instance could potentially get assigned the same address. Opened issue for that oxidecomputer/omicron#3984

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking on it a lil more, can we move GW_MAC_ADDR, BS_MAC_ADDR, BS_IP_ADDR, BOUNDARY_SERVICES_VNI to be exported from oxide-vpc to omicron can also use them leaving us with less places to have to keep in sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I've moved GW_MAC_ADDR to oxide_vpc::api
  • Redefined BS_MAC_ADDR in terms of oxide_vpc::engine::overlay::INTERNET_GATEWAY_MAC
  • BS_IP_ADDR is staying put, as that's no longer a static entity, but we need to choose something for the purpose of this test.

lib/oxide-vpc/tests/common/mod.rs Outdated Show resolved Hide resolved
xde/src/xde.rs Outdated Show resolved Hide resolved
xde/src/xde.rs Outdated Show resolved Hide resolved
@rcgoodfellow rcgoodfellow force-pushed the boundary-services-overlay-flow-pinning branch 3 times, most recently from 216453b to 82ea7ca Compare January 3, 2024 08:37
@rcgoodfellow rcgoodfellow marked this pull request as draft January 4, 2024 20:36
@rcgoodfellow rcgoodfellow force-pushed the boundary-services-overlay-flow-pinning branch from 8c6326c to 760aaee Compare January 4, 2024 20:38
@rcgoodfellow rcgoodfellow self-assigned this Jan 8, 2024
@rcgoodfellow rcgoodfellow marked this pull request as ready for review January 13, 2024 00:23
@rcgoodfellow rcgoodfellow force-pushed the boundary-services-overlay-flow-pinning branch from 71eb2d3 to 73d4669 Compare January 13, 2024 02:17
Copy link
Collaborator

@FelixMcFelix FelixMcFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Ry! I'm happy on the whole with this, just some nits.


impl PartialOrd for TunnelEndpoint {
fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> {
Some(self.ip.cmp(&other.ip))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you impl Ord, you can rely on that instead:

Suggested change
Some(self.ip.cmp(&other.ip))
Some(self.cmp(other))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

pub const BOUNDARY_SERVICES_VNI: u32 = 99u32;
pub const INTERNET_GATEWAY_MAC: [u8; 6] = [0xA8, 0x40, 0x25, 0x77, 0x77, 0x77];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems identical to the new oxide_vpc::api::GW_MAC_ADDR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are slightly different.

pub const INTERNET_GATEWAY_MAC: [u8; 6] = [0xA8, 0x40, 0x25, 0x77, 0x77, 0x77];
pub const GW_MAC_ADDR: MacAddr =
    MacAddr::from_const([0xA8, 0x40, 0x25, 0xFF, 0x77, 0x77]);

The former is meant to represent the MAC of a physical gateway / tunnel endpoint, e.g. a sidecar. INTERNET_GATEWAY_MAC is not a great name for that, so I've renamed to TUNNEL_ENDPOINT_MAC.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I'd missed that distinction. Thanks for clarifying!

use opte::engine::print::*;

cfg_if::cfg_if! {
if #[cfg(all(not(feature = "std"), not(test)))] {
use alloc::string::ToString;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always just use alloc::string::ToString, it doesn't need a cfg_if!{ ... } after #391. std re-exports the exact same types as alloc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

xde/src/xde.rs Outdated
// TODO
// - ddm integration to choose correct underlay device (currently just using
// first device)
#![allow(clippy::arc_with_non_send_sync)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type is tripping this up? Do we have the wrong bounds on KRwLock/KMutex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several of these in xde

warning: usage of an `Arc` that is not `Send` or `Sync`
   --> xde/src/xde.rs:244:20
    |
244 |         let ectx = Arc::new(ExecCtx { log: Box::new(opte::KernelLog {}) });
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: the trait `Send` is not implemented for `ExecCtx`
    = note: the trait `Sync` is not implemented for `ExecCtx`
    = note: required for `Arc<ExecCtx>` to implement `Send` and `Sync`
    = help: consider using an `Rc` instead or wrapping the inner type with a `Mutex`
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#arc_with_non_send_sync
    = note: `#[warn(clippy::arc_with_non_send_sync)]` on by default

warning: usage of an `Arc` that is not `Send` or `Sync`
   --> xde/src/xde.rs:991:14
    |
991 |     let u1 = Arc::new(create_underlay_port(u1_name, "xdeu0")?);
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: the trait `Send` is not implemented for `xde_underlay_port`
    = note: the trait `Sync` is not implemented for `xde_underlay_port`
    = note: required for `Arc<xde_underlay_port>` to implement `Send` and `Sync`
    = help: consider using an `Rc` instead or wrapping the inner type with a `Mutex`
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#arc_with_non_send_sync

warning: usage of an `Arc` that is not `Send` or `Sync`
   --> xde/src/xde.rs:992:14
    |
992 |     let u2 = Arc::new(create_underlay_port(u2_name, "xdeu1")?);
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: the trait `Send` is not implemented for `xde_underlay_port`
    = note: the trait `Sync` is not implemented for `xde_underlay_port`
    = note: required for `Arc<xde_underlay_port>` to implement `Send` and `Sync`
    = help: consider using an `Rc` instead or wrapping the inner type with a `Mutex`
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#arc_with_non_send_sync

warning: usage of an `Arc` that is not `Send` or `Sync`
    --> xde/src/xde.rs:2024:8
     |
2024 |     Ok(Arc::new(pb.create(net, limit, limit)?))
     |        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     |
     = note: the trait `Send` is not implemented for `Port<VpcNetwork>`
     = note: the trait `Sync` is not implemented for `Port<VpcNetwork>`
     = note: required for `Arc<Port<VpcNetwork>>` to implement `Send` and `Sync`
     = help: consider using an `Rc` instead or wrapping the inner type with a `Mutex`
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#arc_with_non_send_sync

warning: `xde` (lib) generated 5 warnings (1 duplicate)
    Finished dev [unoptimized + debuginfo] target(s) in 25.69s

@rcgoodfellow rcgoodfellow linked an issue Jan 18, 2024 that may be closed by this pull request
@rcgoodfellow rcgoodfellow force-pushed the boundary-services-overlay-flow-pinning branch from 3739ce6 to 464938d Compare January 23, 2024 20:10
@rcgoodfellow rcgoodfellow merged commit 1d29ef6 into master Jan 23, 2024
9 checks passed
@rcgoodfellow rcgoodfellow deleted the boundary-services-overlay-flow-pinning branch January 23, 2024 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple boundary services targets per VPC
3 participants