Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote Traceflow API to v1beta1 #5108

Merged
merged 1 commit into from
Jul 28, 2023

Conversation

luolanzone
Copy link
Contributor

@luolanzone luolanzone commented Jun 12, 2023

  1. Promote Traceflow API to v1beta1.
  2. Remove srcIP field in IPHeader and IPv6Header since it's never used and duplicated with parent struct.
  3. Change IPHeader field to a pointer in Packet struct.
  4. Change Flags field to a pointer in the TCPHeader struct.
  5. Change Timeout field in TraceflowSpec from uint16 to int32.
  6. Change Length field in Packet from uint16 to int32.

Note: Given the limited nature of the changes between the 2 API versions, the default None conversion strategy can be used.

@luolanzone luolanzone marked this pull request as draft June 12, 2023 05:38
@luolanzone luolanzone force-pushed the traceflow-api-promotion branch 2 times, most recently from 1349008 to 801a062 Compare June 20, 2023 08:30
@luolanzone luolanzone marked this pull request as ready for review June 20, 2023 09:59
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luolanzone I don't see the CRD conversion webhook as part of this PR?

build/charts/antrea/crds/traceflow.yaml Outdated Show resolved Hide resolved
docs/api.md Outdated Show resolved Hide resolved
pkg/apis/crd/v1beta1/types.go Show resolved Hide resolved
pkg/apis/crd/v1beta1/types.go Outdated Show resolved Hide resolved
DroppedOnly bool `json:"droppedOnly,omitempty"`
// Timeout specifies the timeout of the Traceflow in seconds. Defaults
// to 20 seconds if not set.
Timeout uint16 `json:"timeout,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The K8s API design guide actually discourages the use of unsigned integers:

Do not use unsigned integers, due to inconsistent support across languages and libraries. Just validate that the integer is non-negative if that's the case.

Source: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md

As a rule I would say we should never use unsigned integers in API. There are only a few scenarios in which it may make sense, for example for packet field values, but even then I see that we use int32 for the IP protocol.

Note that the same document recommends using int32 / int64. I don't see int16s in the core K8s API. I feel like we could make this field an int32.

This change should also be mentioned in the commit message / PR description, once it is implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, message updated.

pkg/apis/crd/v1beta1/types.go Outdated Show resolved Hide resolved
// See https://github.com/kubernetes/kubernetes/issues/86811
StartTime *metav1.Time `json:"startTime,omitempty"`
// DataplaneTag is a tag to identify a traceflow session across Nodes.
DataplaneTag uint8 `json:"dataplaneTag,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in this case, I think we can easily use a signed integer. This is entirely managed by Antrea.

Copy link
Contributor

@gran-vmv gran-vmv Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use lower 6 bits in this value as DSCP, thus using int8 can work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@antoninbas antoninbas added api-review Categorizes an issue or PR as actively needing an API review. action/release-note Indicates a PR that should be included in release notes. labels Jun 24, 2023
@luolanzone luolanzone added this to the Antrea v1.13 release milestone Jun 26, 2023
@luolanzone
Copy link
Contributor Author

luolanzone commented Jun 26, 2023

@luolanzone I don't see the CRD conversion webhook as part of this PR?

Hi @antoninbas I didn't see the API change in this PR requires CRD conversion webhook. I think the default None conversion strategy will be used and only the apiVersion field will be modified when serving different versions.
Let me know if I misunderstood it. Thanks.

@antoninbas
Copy link
Contributor

@luolanzone I don't see the CRD conversion webhook as part of this PR?

Hi @antoninbas I didn't see the API change in this PR requires CRD conversion webhook. I think the default None conversion strategy will be used and only the apiVersion field will be modified when serving different versions. Let me know if I misunderstood it. Thanks.

I think you're correct. It would be good to mention it briefly in the commit message / PR description:

Given the limited nature of the changes between the 2 API versions, the default None conversion strategy can be used.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the API changes look good to me.
However, we should use the OpenAPI schema to validate that integer fields are in a valid range. For example:

                ttl:
                  type: integer
                  minimum: 0
                  maximum: 255

@luolanzone
Copy link
Contributor Author

The commit message for default None webhook is updated, and a few openAPI spec validation for integer fields are added.

@antoninbas
Copy link
Contributor

antoninbas commented Jul 3, 2023

The commit message for default None webhook is updated, and a few openAPI spec validation for integer fields are added.

I'm confused by the latest version of the changes:

  • unless I'm reading it wrong, it seems that you have edited the openapi schema, for version v1alpha1 of the API, which I am not sure we should be doing
  • the integer type constraints have been added to the v1alpha1 schema instead of the v1beta1 schema?

@luolanzone luolanzone force-pushed the traceflow-api-promotion branch 2 times, most recently from 668e5b5 to 3875ac3 Compare July 7, 2023 07:20
@luolanzone
Copy link
Contributor Author

The commit message for default None webhook is updated, and a few openAPI spec validation for integer fields are added.

I'm confused by the latest version of the changes:

* unless I'm reading it wrong, it seems that you have edited the openapi schema, for version v1alpha1 of the API, which I am not sure we should be doing

* the integer type constraints have been added to the v1alpha1 schema instead of the v1beta1 schema?

Hi @antoninbas thanks to point out, I made the change by mistake, revert the change and make the validation check in v1beta1 only.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are more integer fields for which we should ideally constraint the range of supported values using the openapi schema.

I have a question for @tnqn: if we restrict a field to a smaller range of values than before, does that break backwards-compatibility, and what should we do in that case.

I'll take 2 concrete examples:

  • The IP length field used to be a uint16. In this API change, we are transforming it to an int32, as per K8s API best practices. We can add OpenAPI constraints to restrict the range to [0, 65535]. In theory, the new range of possible values matches the old range of possible values. In practice, would it have been possible for someone to store a larger integer value in etcd by providing a large value in the YAML / JSON?
  • The TTL field is an int32. In this API change, we are adding OpenAPI constraints to restrict the range to [0, 255] (which is the valid range for the field), In this case, it is definitely possible for an earlier object to have been stored in etcd with a value > 255. Is this a breaking change?

On a more general note: should we mandate minimum and maximum constrains for all integer fields moving forward?

build/charts/antrea/crds/traceflow.yaml Show resolved Hide resolved
build/charts/antrea/crds/traceflow.yaml Show resolved Hide resolved
build/charts/antrea/crds/traceflow.yaml Show resolved Hide resolved
@luolanzone
Copy link
Contributor Author

I think there are more integer fields for which we should ideally constraint the range of supported values using the openapi schema.

I have a question for @tnqn: if we restrict a field to a smaller range of values than before, does that break backwards-compatibility, and what should we do in that case.

I'll take 2 concrete examples:

* The IP length field used to be a `uint16`. In this API change, we are transforming it to an `int32`, as per K8s API best practices. We can add OpenAPI constraints to restrict the range to [0, 65535]. In theory, the new range of possible values matches the old range of possible values. In practice, would it have been possible for someone to store a larger integer value in etcd by providing a large value in the YAML / JSON?

* The TTL field is an `int32`. In this API change, we are adding OpenAPI constraints to restrict the range to [0, 255] (which is the valid range for the field), In this case, it is definitely possible for an earlier object to have been stored in etcd with a value > 255. Is this a breaking change?

On a more general note: should we mandate minimum and maximum constrains for all integer fields moving forward?

I added the validation for IP length to [0, 65535] too, I feel it should be fine considering traceflow CR is a short-lived CR, the failed or succeed CR won't be processed again. Let me know if you have any concern @antoninbas @tnqn @gran-vmv

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but we should get another review as this is an API change

pkg/agent/openflow/client.go Outdated Show resolved Hide resolved
@luolanzone
Copy link
Contributor Author

@tnqn @gran-vmv coul you take another look? thanks.

@shi0rik0
Copy link
Contributor

I have some opinions about the API.

  1. The Packet struct in types.go is poorly defined. Currently it serves two roles: describing the traceflow spec (as in TraceflowSpec struct) and traceflow results (as in TraceflowStatus struct). They are two different things. We shouldn't mix them up. For example, there's a Length field in the Packet struct, but this field is only used in traceflow results. I think we can define two structs.
  2. The IP and Service fields in the Source and Destination struct may be redundant. For the IP field, the SrcIP and DstIP in the Packet struct have the same effect. For the Service field, it seems that it's only used to be translated into an IP. I think we can remove this syntax sugar, or move it to the Packet struct for integrity.

@luolanzone

@luolanzone
Copy link
Contributor Author

I have some opinions about the API.

1. The `Packet` struct in types.go is poorly defined. Currently it serves two roles: describing the traceflow spec (as in `TraceflowSpec` struct) and traceflow results (as in `TraceflowStatus` struct). They are two different things. We shouldn't mix them up. For example, there's a `Length` field in the `Packet` struct, but this field is only used in traceflow results. I think we can define two structs.

2. The `IP` and `Service` fields in the `Source` and `Destination` struct may be redundant. For the `IP` field, the `SrcIP` and `DstIP` in the `Packet` struct have the same effect. For the `Service` field, it seems that it's only used to be translated into an IP. I think we can remove this syntax sugar, or move it to the `Packet` struct for integrity.

@luolanzone

@shi0rik0, thanks for the input.
For option 1, I don't think it's necessary to use two structs if there is only a Length field redundant.
For option 2, We allow user to configure Service as Destination in a Traceflow spec. With this Service field, user don't have to bother about the real IP of a Service. It's a user-friendly field instead of a syntax sugar.
The IP seems redundant, but the Packet is more about the header info when it's used for TraceflowSpec, and the full struct is used for TraceflowStatus., You may check the Traceflow CRD, the SrcIP, DstIP and Length are actually not defined in the Spec. https://github.com/antrea-io/antrea/blob/main/build/charts/antrea/crds/traceflow.yaml#L86-L138.
User won't be able to use them via a Traceflow CR.

@gran-vmv welcome to input.

Copy link
Contributor

@gran-vmv gran-vmv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Do we need to keep at least 1 e2e case to check old traceflow API?

@gran-vmv
Copy link
Contributor

I have some opinions about the API.

1. The `Packet` struct in types.go is poorly defined. Currently it serves two roles: describing the traceflow spec (as in `TraceflowSpec` struct) and traceflow results (as in `TraceflowStatus` struct). They are two different things. We shouldn't mix them up. For example, there's a `Length` field in the `Packet` struct, but this field is only used in traceflow results. I think we can define two structs.

2. The `IP` and `Service` fields in the `Source` and `Destination` struct may be redundant. For the `IP` field, the `SrcIP` and `DstIP` in the `Packet` struct have the same effect. For the `Service` field, it seems that it's only used to be translated into an IP. I think we can remove this syntax sugar, or move it to the `Packet` struct for integrity.

@luolanzone

@shi0rik0, thanks for the input. For option 1, I don't think it's necessary to use two structs if there is only a Length field redundant. For option 2, We allow user to configure Service as Destination in a Traceflow spec. With this Service field, user don't have to bother about the real IP of a Service. It's a user-friendly field instead of a syntax sugar. The IP seems redundant, but the Packet is more about the header info when it's used for TraceflowSpec, and the full struct is used for TraceflowStatus., You may check the Traceflow CRD, the SrcIP, DstIP and Length are actually not defined in the Spec. https://github.com/antrea-io/antrea/blob/main/build/charts/antrea/crds/traceflow.yaml#L86-L138. User won't be able to use them via a Traceflow CR.

@gran-vmv welcome to input.

Agree with @luolanzone
For option 1, current Packet struct only support very limited parameters. If we plan to add more new parameters to this struct, we can separate this or use inline struct

@luolanzone
Copy link
Contributor Author

LGTM. Do we need to keep at least 1 e2e case to check old traceflow API?

What's kind of e2e case? check old API availability? I did verification manually before, it looks good and both versions are served. User can get CRs via two versions.

@luolanzone
Copy link
Contributor Author

@antoninbas @tnqn please help to take a look, thanks.

@gran-vmv
Copy link
Contributor

LGTM. Do we need to keep at least 1 e2e case to check old traceflow API?

What's kind of e2e case? check old API availability? I did verification manually before, it looks good and both versions are served. User can get CRs via two versions.

Yes, we can have 1 e2e case to check old API availability, to avoid unexpected break change in future.

@luolanzone
Copy link
Contributor Author

LGTM. Do we need to keep at least 1 e2e case to check old traceflow API?

What's kind of e2e case? check old API availability? I did verification manually before, it looks good and both versions are served. User can get CRs via two versions.

Yes, we can have 1 e2e case to check old API availability, to avoid unexpected break change in future.

I am not sure if it's worthy to do so considering we just simply changed the storage as false for the old API. I didn't see there is any way that will cause the old API become unavailable. What's kind of unexpected break change you are referring to?

@shi0rik0
Copy link
Contributor

shi0rik0 commented Jul 21, 2023

Another suggestion: we can group all live-traffic-related specs together to a LiveTraffic struct, and make this struct a pointer, like this:

type TraceflowSpec struct {
	Source      Source      `json:"source,omitempty"`
	Destination Destination `json:"destination,omitempty"`
	Packet      Packet      `json:"packet,omitempty"`

	LiveTraffic *LiveTraffic `json:"liveTraffic,omitempty"`

	Timeout uint16 `json:"timeout,omitempty"`
}

type LiveTraffic struct {
	DroppedOnly bool `json:"droppedOnly,omitempty"`
	// Other fields like sampling parameters.
}

This can tell users what fields are related to live-traffic traceflows, and it can avoid problems like what if we specify DroppedOnly == true && LiveTraffic == false.

@luolanzone
Copy link
Contributor Author

luolanzone commented Jul 21, 2023

Another suggestion: we can group all live-traffic-related specs together to a LiveTraffic struct, and make this struct a pointer, like this:

type TraceflowSpec struct {
	Source      Source      `json:"source,omitempty"`
	Destination Destination `json:"destination,omitempty"`
	Packet      Packet      `json:"packet,omitempty"`

	LiveTraffic *LiveTraffic `json:"liveTraffic,omitempty"`

	Timeout uint16 `json:"timeout,omitempty"`
}

type LiveTraffic struct {
	DroppedOnly bool `json:"droppedOnly,omitempty"`
	// Other fields like sampling parameters.
}

This can tell users what fields are related to live-traffic traceflows, and it can avoid problems like what if we specify DroppedOnly == true && LiveTraffic == false.

@shi0rik0, thanks for the input, I would like to avoid big changes on this PR, if your suggestion is for live sampling, I would suggest to include the change in your PR for the live sampling.

@tnqn @gran-vmv @antoninbas @jianjuns since there is a task for firstN sampling feature in Traceflow which is taken by @shi0rik0 , I think there might be some big changes for Traceflow, I am wondering should we hold this PR and wait for the firstN sampling feature, or we promote the API to v1beta1 with small changes first?

Length int32 `json:"length,omitempty"`
IPHeader *IPHeader `json:"ipHeader,omitempty"`
IPv6Header *IPv6Header `json:"ipv6Header,omitempty"`
TransportHeader TransportHeader `json:"transportHeader,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is not a pointer?


// TraceflowSpec describes the spec of the traceflow.
type TraceflowSpec struct {
Source Source `json:"source,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all Source, Destination, Packet be pointers?

Copy link
Contributor

@shi0rik0 shi0rik0 Jul 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all Source, Destination, Packet be pointers?

The pros and cons of setting the types to pointers:

  • Pros: we can distinguish whether the field is specified by users or not because we allow nil value.
  • Cons: we have to check whether this field is nil or not every time we access it. If we forget, a panic may be raised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose all of them including TransportHeader can be pointers, there are indeed some pros and cons like @shi0rik0 mentioned above. Let me know if you feel it's better to be pointers. I will also double check with @gran-vmv in case I missed anything for the original design.

Copy link
Contributor

@jianjuns jianjuns Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we needs to check individual fields in a struct to know the struct is specified by users, I think a pointer would be better.
Also feel we should have a consistent style. Should not have some fields to be pointers and some not with no reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to defer related changes to next release considering we may need more efforts for verification.
@shi0rik0 please check if you can take care of this when you make the firstN sampling changes, including those suggestions you raise in the comments. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion about changing these fields to pointers, but if we do make that change, it should be done in its own PR (not when implementing a new feature like firstN sampling).

@shi0rik0
Copy link
Contributor

Current TCPHeader definition is as follows:

// TCPHeader describes spec of a TCP header.
type TCPHeader struct {
	// SrcPort is the source port.
	SrcPort int32 `json:"srcPort,omitempty"`
	// DstPort is the destination port.
	DstPort int32 `json:"dstPort,omitempty"`
	// Flags are flags in the header.
	Flags int32 `json:"flags,omitempty"`
}

I think the Flags field is not very user-friendly. Perhaps we can split it to more fields like ACK, SYN, etc.?

@shi0rik0
Copy link
Contributor

shi0rik0 commented Jul 24, 2023

I think we can group IPHeader and IPv6Header into a struct, like what we do in TransportHeader. In this way, we can use schema validation oneOf to constrain users to specify only one of IPHeader and IPv6Header.

// TransportHeader describes spec of a TransportHeader.
type TransportHeader struct {
	ICMP *ICMPEchoRequestHeader `json:"icmp,omitempty" yaml:"icmp,omitempty"`
	UDP  *UDPHeader             `json:"udp,omitempty" yaml:"udp,omitempty"`
	TCP  *TCPHeader             `json:"tcp,omitempty" yaml:"tcp,omitempty"`
}

@tnqn
Copy link
Member

tnqn commented Jul 24, 2023

I think let's only make necessary changes in the PR. For semantic improvements, UE enhancements, let's discuss and implement each of them separately, taking compatibility, cost, and and importance into consideration.

@luolanzone
Copy link
Contributor Author

Code conflicts resolved.

@luolanzone
Copy link
Contributor Author

@jianjuns @tnqn @antoninbas could you take a look again? the code conflicts are resolved, and I feel we can defer to next release with more discussions and verification for some comments. Thanks.

@luolanzone
Copy link
Contributor Author

/test-all

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, I had one last question which I left as a comment

you need to rebase the PR

// IPv6Header describes spec of an IPv6 header.
type IPv6Header struct {
// NextHeader is the IPv6 protocol.
NextHeader *int32 `json:"nextHeader,omitempty" yaml:"nextHeader,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick question: why do we use a pointer for NextHeader, but not for Protocol?
sorry if this was discussed before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not discussed before for this field, I don't know if there is any specific reason to use pointer for NextHeader but not for Protocol, @gran-vmv may know the history.
There was a few discussions about other fields as well, I feel we may go through the whole spec to make the fields type consistent if it's more proper to use 'pointer' in a separate PR with more discussions and verification.

1. Promote Traceflow API to v1beta1.
2. Remove srcIP field in IPHeader and IPv6Header since it's never used and
duplicated with parent struct.
3. Change IPHeader field to a pointer in Packet struct.
4. Change Flags field to a pointer in the TCPHeader struct.
5. Change Timeout field in TraceflowSpec from uint16 to int32.
6. Change Length field in Packet from uint16 to int32.
7. Add OpenAPISpec schema validation for a few integer type of fields.

Note: Given the limited nature of the changes between the 2 API versions,
the default None conversion strategy can be used.

Signed-off-by: Lan Luo <luola@vmware.com>
@luolanzone
Copy link
Contributor Author

Code conflicts are resolved.
/test-all

@tnqn tnqn merged commit 4dcd030 into antrea-io:main Jul 28, 2023
42 of 43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes. api-review Categorizes an issue or PR as actively needing an API review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants