forked from p4lang/p4runtime
-
Notifications
You must be signed in to change notification settings - Fork 0
/
P4Runtime-Spec.mdk
executable file
·6881 lines (5758 loc) · 303 KB
/
P4Runtime-Spec.mdk
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Title : P4Runtime Specification
Title Note: version 1.4.0-dev
Title Footer: &date;
Author: The P4.org API Working Group
Heading depth: 5
Pdf Latex: xelatex
Document Class: [11pt]article
Package: [top=1in, bottom=1.25in, left=1in, right=1in]geometry
Package: fancyhdr
Tex Header:
\setlength{\headheight}{30pt}
\setlength{\emergencystretch}{2em}
Bib: references.bib
Bib Search Url:
Script: https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js
Script: bibref_no_title.js
@if html {
body.madoko {
font-family: utopia-std, serif;
}
title,titlenote,titlefooter,authors,h1,h2,h3,h4,h5 {
font-family: helvetica, sans-serif;
font-weight: bold;
}
pre, code {
font-family: monospace;
font-size: 10pt;
}
}
@if tex {
body.madoko {
font-family: UtopiaStd-Regular;
}
title,titlenote,titlefooter,authors {
font-family: sans-serif;
font-weight: bold;
}
pre, code {
font-family: LuxiMono;
font-size: 75%;
}
}
Colorizer: p4
Colorizer: proto
Colorizer: prototext
Colorizer: cpp
.token.keyword {
font-weight: bold;
}
@if html {
p4example {
replace: "~ Begin P4ExampleBlock&nl;\
````p4&nl;&source;&nl;````&nl;\
~ End P4ExampleBlock";
padding:6pt;
margin-top: 6pt;
margin-bottom: 6pt;
border: solid;
background-color: #ffffdd;
border-width: 0.5pt;
}
}
@if tex {
p4example {
replace: "~ Begin P4ExampleBlock&nl;\
````p4&nl;&source;&nl;````&nl;\
~ End P4ExampleBlock";
breakable: true;
padding: 6pt;
margin-top: 6pt;
margin-bottom: 6pt;
border: solid;
background-color: #ffffdd;
border-width: 0.5pt;
}
}
@if html {
pseudo {
replace: "~ Begin PseudoBlock&nl;\
````&nl;&source;&nl;````&nl;\
~ End PseudoBlock";
padding: 6pt;
margin-top: 6pt;
margin-bottom: 6pt;
border: solid;
background-color: #e9fce9;
border-width: 0.5pt;
}
}
@if tex {
pseudo {
replace: "~ Begin PseudoBlock&nl;\
````&nl;&source;&nl;````&nl;\
~ End PseudoBlock";
breakable : true;
padding: 6pt;
margin-top: 6pt;
margin-bottom: 6pt;
background-color: #e9fce9;
border: solid;
border-width: 0.5pt;
}
}
@if html {
cpp {
replace: "~ Begin CPPblock&nl;\
````cpp&nl;&source;&nl;````&nl;\
~ End CPPblock";
border: solid;
margin-top: 6pt;
margin-bottom: 6pt;
padding: 6pt;
background-color: #e9fce9;
border-width: 0.5pt;
}
}
@if tex {
cpp {
replace: "~ Begin CPPblock&nl;\
````cpp&nl;&source;&nl;````&nl;\
~ End CPPblock";
breakable: true;
margin-top: 6pt;
margin-bottom: 6pt;
padding: 6pt;
background-color: #e9fce9;
border: solid;
border-width: 0.5pt;
}
}
@if html {
proto {
replace: "~ Begin Protoblock&nl;\
````proto&nl;&source;&nl;````&nl;\
~ End Protoblock";
border: solid;
margin-top: 6pt;
margin-bottom: 6pt;
padding: 6pt;
background-color: #e6ffff;
border-width: 0.5pt;
}
}
@if tex {
proto {
replace: "~ Begin Protoblock&nl;\
````proto&nl;&source;&nl;````&nl;\
~ End Protoblock";
breakable: true;
margin-top: 6pt;
margin-bottom: 6pt;
padding: 6pt;
background-color: #e6ffff;
border: solid;
border-width: 0.5pt;
}
}
@if html {
prototext {
replace: "~ Begin Prototextblock&nl;\
````prototext&nl;&source;&nl;````&nl;\
~ End Prototextblock";
border: solid;
margin-top: 6pt;
margin-bottom: 6pt;
padding: 6pt;
background-color: #e6ffff;
border-width: 0.5pt;
}
}
@if tex {
prototext {
replace: "~ Begin Prototextblock&nl;\
````prototext&nl;&source;&nl;````&nl;\
~ End Prototextblock";
breakable: true;
margin-top: 6pt;
margin-bottom: 6pt;
padding: 6pt;
background-color: #e6ffff;
border: solid;
border-width: 0.5pt;
}
}
[TITLE]
~ Begin Abstract
P4 is a language for programming the data plane of network devices. The
P4Runtime API is a control plane specification for controlling the data plane
elements of a device defined or described by a P4 program. This document
provides a precise definition of the P4Runtime API. The target audience for this
document includes developers who want to write controller applications for P4
devices or switches.
~ End Abstract
[TOC]
[TOC=figures]
[TOC=tables]
# Introduction and Scope
This document is published by the *P4.org API Working Group*, which was
chartered [@P4APIWGCharter] to design and standardize vendor-independent,
protocol-independent runtime APIs for P4-defined or P4-described data
planes. This document specifies one such API, called *P4Runtime*. It is meant to
disambiguate and augment the programmatic API definition expressed in Protobuf
format and available at
[https://github.com/p4lang/p4runtime/tree/main/proto](https://github.com/p4lang/p4runtime/tree/main/proto).
## P4 Language Version Applicability
P4Runtime is designed to be implemented in conjunction with the P4~16~ language
version or later. P4~14~ programs should be translated into P4~16~ to be made
compatible with P4Runtime. This version of P4Runtime utilizes features which are
not in P4~16~ 1.0, but were introduced in P4~16~ 1.2.4 [@P4Revisions124]. For
this version of P4Runtime, we recommend using P4~16~ 1.2.4 [@P4Revisions124].
This version of the P4Runtime specification does not yet explicitly
address compatibility with the following P4~16~ language features
introduced in versions 1.2.2 or 1.2.4 of the language specification:
* Added support for generic structures [@P4Revisions122].
* Added support for additional enumeration types [@P4Revisions122].
* Added support for 0-width bitstrings and varbits [@P4Revisions122].
* Clarified restrictions for parameters with default values
[@P4Revisions124].
* Allow ranges to be specified by serializable enums
[@P4Revisions124].
* Added `list` type [@P4Revisions124].
* Clarified behavior of table with no `key` property, or if its list
of keys is empty [@P4Revisions124].
## In Scope
This specification document defines the *semantics* of *P4Runtime* messages,
whose syntax is defined in Protobuf format. The following are in scope of
P4Runtime:
* Runtime control of P4 built-in objects (tables and Value Sets) and Portable
Switch Architecture (PSA) [@PSA] externs (⪚ Counters, Meters, Action
Profiles, ...). We recommend that this version of P4Runtime be used with
targets that are compliant with PSA version 1.1.0.
* Runtime control of architecture-specific (non-PSA) externs, through an
extension mechanism.
* Basic session management for Software-Defined Networking (SDN) use-cases,
including support for controller replication to enable control plane
redundancy.
* Partition of the P4 forwarding elements into different roles, which can be
assigned to different control entities.
* Packet I/O to enable streaming packets to & from the control plane.
* Batching support, with different atomicity guarantees.
* In-the-field device-reconfiguration with a new P4 data plane.
The following are in the scope of this specification document:
* Rationale for the P4Runtime design.
* Reference architecture and use-cases for deploying a P4Runtime service.
* Detailed description of the API semantics.
* Requirements for conformant implementations of the API.
## Not In Scope
The following are not in scope of P4Runtime:
* Runtime control of elements outside the P4 language. For example,
architecture-dependent elements such as ports, traffic management, etc. are
outside of the P4 language and are thus not covered by P4Runtime. Efforts are
underway to standardize the control of these via gNMI and gNOI APIs, using
description models defined and maintained by the OpenConfig project
[@OpenConfig]. An open source implementation of these APIs is also in progress
as part of the Stratum project [@Stratum].
* Protobuf message definitions for runtime control of non-PSA externs. While
P4Runtime includes an extension mechanism to support additional P4
architectures, it does not define the syntax or semantics of any additional
control message for externs introduced by non-PSA architectures.
The following are not in scope of this specification document:
* Description of the P4 programming language; it is assumed that the reader is
already familiar with P4~16~ [@P4Spec].
* Descriptions of gRPC and Protobuf files in general.
* Controller [role](#sec-arbitration-role-config) definition (for partition of
P4 entities); the P4.org API Working Group may publish a companion document in
the future describing one possible role definition scheme.
# Terms and Definitions
* arbitration
: Refers to the process through which P4Runtime ensures that at any given
time, there is a single primary controller (&ie; a client with write access)
for a given role. Also referred to as "client arbitration".
* client
: The gRPC client is the software entity which controls the P4 target or
device by communicating with the gRPC agent or server. The client may be
local (within the device) or remote (for example, an SDN controller).
* COS
: Class of Service.
* device
: Synonymous with target, although device usually connotes a physical
appliance or other hardware, whereas target can signify hardware or
software.
* entity
: An instantiated P4 program object such as a table or an extern (from PSA or
any other architecture).
* gRPC
: gRPC Remote Procedure Calls, an open-source client-server RPC framework. See
[@gRPC].
* HA
: High-Availability. Refers to a redundancy architecture.
* Instrumentation
: The part of the P4Runtime server which implements the calls to the device or
target native "SDK" or backend.
* IPC
: Inter-Process Communication.
* P4 Blob
: A more colloquial term for P4 Device Config (Blob = Binary Large Object).
* P4 Device Config
: The output of the P4 compiler backend, which is included in the Forwarding
Pipeline Config. This is opaque, architecture- and target-specific binary
data which can be loaded onto the device to change its "program."
* P4Info
: Metadata which specifies the P4 entities which can be accessed via
P4Runtime. These entities have a one-for-one correspondence with
instantiated objects in the P4 source code.
* P4RT
: Abbreviation for P4Runtime.
* Protobuf (Protocol Buffers)
: The wire serialization format for P4Runtime. Protobuf version 3 (proto3) is
used to define the P4Runtime interface. See [@Proto].
* PSA
: Portable Switch Architecture [@PSA]; a target architecture that describes
common capabilities of network switch devices that process and forward
packets across multiple interface ports.
* RPC
: Remote Procedure Call.
* RTT
: Round-trip time.
* SDN
: Software-Defined Networking, an approach to networking that advocates the
separation of the control and forwarding planes, as well as the abstraction
of the networking infrastructure, in order to promote programmability of the
network control. SDN is often associated with OpenFlow, a communications
protocol that enables remote control of the network infrastructure through a
programmable, centralized network *controller*.
* SDN port
: A 32-bit port number defined by a remote Software-Defined Network (SDN)
controller. The SDN port number maps to a unique device port id, which may
be in a different number space.
* server
: The gRPC server which accepts P4Runtime requests on the device or target. It
uses instrumentation to translate P4Runtime API calls into target-specific
actions.
* stream
: Refers to a gRPC Stream, which is a RPC on which several messages can be
sent and received. P4Runtime defines one Stream RPC (`StreamChannel`), which
is a bidirectional stream (both the client and the server can send messages)
which is used for packet I/O and client arbitration, among other things.
* switch config
: Refers to the non-forwarding config (different from the P4 Forwarding
Pipeline Config) that is delivered to the switch via a different
interface. For example, the switch config may be captured using OpenConfig
models and delivered through a gNMI interface.
* target
: The hardware or software entity which "executes" the P4 pipeline and hosts
the P4Runtime Service; often used interchangeably with "device".
* URI
: Uniform Resource Identifier; a string of characters designed for unambiguous
identification of resources.
# Reference Architecture { #sec-reference-architecture}
Figure [#fig-reference-architecture] represents the P4Runtime Reference
Architecture. The device or target to be controlled is at the bottom, and one or
more controllers is shown at the top. P4Runtime only grants write access to a
single primary controller for each read/write entity. A role defines a grouping
of P4 entities. P4Runtime allows for a primary controller for each role, and a
role-based client arbitration scheme ensures only one controller has
write access to each read/write entity, or the pipeline config itself. Any
controller may perform read access to any entity or the pipeline config. Later
sections describe this in detail. For the sake of brevity, the term controller
may refer to one or more controllers.
The P4Runtime API defines the messages and semantics of the interface between
the client(s) and the server. The API is specified by the p4runtime.proto
Protobuf file, which is available on GitHub as part of the standard
[@P4RuntimeRepo]. It may be compiled via protoc --- the Protobuf compiler ---
to produce both client and server implementation stubs in a variety of
languages. It is the responsibility of target implementers to instrument the
server.
Reference implementations of P4 targets supporting P4Runtime, as well as sample
clients, may be available on the p4lang/PI GitHub repository [@PIRepo]. A future
goal may be to produce a reference gRPC server which can be instrumented in a
generic way, ⪚ via callbacks, thus reducing the burden of implementing
P4Runtime.
The controller can access the P4 entities which are declared in the P4Info
metadata. The P4Info structure is defined by p4info.proto, another Protobuf file
available as part of the standard.
The controller can also set the `ForwardingPipelineConfig`, which amounts to
installing and running the compiled P4 program output, which is included in the
`p4_device_config` Protobuf message field, and installing the associated P4Info
metadata. Furthermore, the controller can query the target for the
`ForwardingPipelineConfig` to retrieve the device config and the P4Info.
~ Figure { #fig-reference-architecture; \
caption: "P4Runtime Reference Architecture." }
![reference-architecture]
~
[reference-architecture]: build/reference-architecture.[svg,png] \
{ height: 7cm; page-align: here }
## P4Runtime Service Implementation
The P4Runtime API is implemented by a program that runs a gRPC server which
binds an implementation of auto-generated P4Runtime Service interface. This
program is called the "P4Runtime server." The server must listen on TCP port
9559 by default, which is the port that has been allocated by IANA for the
P4Runtime service. Servers should allow users to override the default port
using a configuration file or flag when starting the server. Uses of other
port numbers as the default should be discontinued.
### Security concerns
Appropriate measures and security best practices must be in place to protect
the P4Runtime server and client, and the communication channel between the two.
For example, firewalling and authenticating the incoming connections to the
P4Runtime server can prevent a malicious actor from taking over the switch.
Similarly, using TLS to authenticate and encrypt the gRPC channel can prevent
man-in-the-middle attacks between the server and client. Mutual TLS (mTLS) may
be used to facilitate the authentication of the client by the server and
vice-versa.
## Idealized Workflow
In the idealized workflow, a P4 source program is compiled to produce both a P4
device config and P4Info metadata. These comprise the `ForwardingPipelineConfig`
message. A P4Runtime controller chooses a configuration appropriate to a
particular target and installs it via a `SetForwardingPipelineConfig`
RPC. Metadata in the P4Info describes both the overall program itself
(`PkgInfo`) as well as all entity instances derived from the P4 program ---
tables and extern instances. Each entity instance has an associated numeric ID
assigned by the P4 compiler which serves as a concise "handle" used in API
calls.
In this workflow, P4 compiler backends are developed for each unique type of
target and produce P4Info and a target-specific device config. The P4Info schema
is designed to be target and architecture-independent, although the specific
contents are likely to be architecture-dependent. The compiler ensures the code
is compatible with the specific target and rejects code which is incompatible.
In some use cases, it is expected that a controller will store a
collection of multiple P4 "packages", where each package consists of
the P4 device config and P4Info, and install them at will onto the target. A
controller can also query the `ForwardingPipelineConfig` from the target via the
`GetForwardingPipelineRequest` RPC. This can be useful to obtain the pipeline
configuration from a running device to synchronize the controller to its current
state.
## P4 as a Behavioral Description Language { #sec-p4-as-behavioral-description-language}
P4 can be considered a behavioral description of a switching device which may or
may not execute "P4" natively. There is no requirement that a P4 compiler be
used in the production of either the P4 device config or the P4Info. There is no
absolute requirement that the target accept a `SetForwardingPipelineRequest` to
change its pipeline "program", as some devices may be fixed in function, or
configured via means other than P4 programs. Furthermore, a controller can run
without a P4 source program, since the P4Info file provides all of the
information necessary to describe the P4Runtime API messages needed to configure
such a device.
While a P4 program does provide a precise description of the data plane
behavior, and this can prove invaluable in writing correct control plane
software, in some cases it is enough for a control plane software developer to
have the control plane API, plus good documentation of the data plane
behavior. Some device vendors may wish to keep their P4 source code private. The
minimum requirement for the controller and device to communicate properly is a
P4Info file that can be loaded by a controller in order to render the correct
P4Runtime API.
In such scenarios, it is crucial to have detailed documentation, perhaps
included in the P4Info file itself, specifically the metadata in the `PkgInfo`
message as well as the embedded `doc` fields. Nevertheless, a P4 program which
describes the pipeline is ideally available. The contents of the P4Info file
will be described in later sections.
## Alternative Workflows
Given the notions above concerning P4 code as behavioral description and P4Info
as API metadata, some other workflows are possible. The scenarios below are just
examples and actual situations may vary.
### P4 Source Available, Compiled into P4Info but not Compiled into P4 Device Config
In this situation, P4 source code is available mainly as a behavioral model and
compiled to produce P4Info, but it is not compiled to produce the
`p4_device_config`. The device's configuration might be derived via some other
means to implement the P4 source code's intentions. The P4 code, if available,
can be studied to understand the pipeline, and the P4Info can be used to
implement the control plane.
### No P4 Source Available, P4Info Available
In this situation, P4Info is available but no P4 source is available for any
number of reasons, the most likely of which are:
1. The vendor or organization does not wish to divulge the P4 source code, to
protect intellectual property or maintain security.
2. The target was not implemented using P4 code to begin with, although it still
obeys the control plane API specified in the P4Info.
As discussed in Section [#sec-p4-as-behavioral-description-language], in the
absence of a P4 program describing the data plane behavior, the detailed
knowledge required to write correct control plane code must come from other
sources, ⪚ documentation.
### Partial P4Info and P4 Source are Available
In this situation, a subset of the target's pipeline configuration is exposed as
P4 source code and P4Info. The complete device behavior might be expressed as a
larger P4 program and P4Info, but these are not exposed to everybody. This
limits API access to only certain functions and behaviors. The hidden functions
and APIs might be available to select users who would have access to the
complete P4Info and possibly P4 source code.
### P4Info Role-Based Subsets
In this situation, P4Info is selectively packaged into role-based subsets to
allow some controllers access to just the functionality required. For example, a
controller may only need read access to statistics counters and nothing more.
## P4Runtime State Across Restarts { #sec-restarts }
All targets support full restarts, where all forwarding state is reset and the
P4Runtime server starts with a clean state. Some targets may also support
In-Service Software Upgrade (ISSU), where the software on the target can be
restarted while traffic is being forwarded. In this case, the P4Runtime server
may have the ability to access information from memory before the upgrade.
# Controller Use-cases
P4Runtime allows for more than one controller. The mechanisms and semantics are
described in a later
[section](#sec-client-arbitration-and-controller-replication). Here we
present a number of use-cases. Each use-case highlights a particular aspect of
P4Runtime's flexibility and is not intended to be exhaustive. Real-world
use-cases may combine various techniques and be more complex.
## Single Embedded Controller
Figure [#fig-single-embedded-controller] shows perhaps the simplest use-case. A
device or target has an embedded controller which communicates to an on-board
switch via P4Runtime. This might be appropriate for an embedded appliance which
is not intended for SDN use-cases.
P4Runtime was designed to be a viable embedded API. Complex controller
architectures typically feature multiple processes communicating with some sort
of IPC (Inter-Process Communications). P4Runtime is thus both an ideal RPC and
an IPC.
~ Figure { #fig-single-embedded-controller; \
caption: "Use-Case: Single Embedded Controller" }
![single-embedded-controller]
~
[single-embedded-controller]: build/single-embedded-controller.[svg,png] \
{ height: 6cm; page-align: forcehere }
## Single Remote Controller
Figure [#fig-single-remote-controller] shows a single remote Controller in
charge of the P4 target. In this use-case, the device has no control of the
pipeline, it just hosts the server. While this is possible, it is probably more
practical to have a hybrid use-case as described in subsequent sections.
~ Figure { #fig-single-remote-controller; \
caption: "Use-Case: Single Remote Controller" }
![single-remote-controller]
~
[single-remote-controller]: build/single-remote-controller.[svg,png] \
{ height: 7cm; page-align: forcehere }
## Embedded + Single Remote Controller
Figure [#fig-embedded-plus-single-remote-controller] illustrates the use-case of
an embedded controller plus a single remote controller. Both controllers are
clients of the single server. The embedded controller is in charge of one set of
P4 entities plus the pipeline configuration. The remote controller is in charge
of the remainder of the P4 entities. An equally-valid, alternative use-case,
could assign the pipeline configuration to the remote controller.
For example, to minimize round-trip times (RTT) it might make sense for the
embedded controller to manage the contents of a fast-failover table. The remote
controller might manage the contents of routing tables.
~ Figure { #fig-embedded-plus-single-remote-controller; \
caption: "Use-Case: Embedded Plus Single Remote Controller" }
![embedded-plus-single-remote-controller]
~
[embedded-plus-single-remote-controller]: \
build/embedded-plus-single-remote-controller.[svg,png] \
{ height: 7cm; page-align: forcehere }
## Embedded + Two Remote Controllers
Figure [#fig-embedded-plus-two-remote-controllers] illustrates the case of an
embedded controller similar to the previous use-case, and two remote
controllers. One of the remote controllers is responsible for some entities,
⪚ routing tables, and the other remote controller is responsible for other
entities, perhaps statistics tables. Role-based access divides the ownership.
~ Figure { #fig-embedded-plus-two-remote-controllers; \
caption: "Use-Case: Embedded Plus Two Remote Controllers" }
![embedded-plus-two-remote-controllers]
~
[embedded-plus-two-remote-controllers]: \
build/embedded-plus-two-remote-controllers.[svg,png] \
{ height: 7cm; page-align: forcehere }
## Embedded Controller + Two High-Availability Remote Controllers
Figure [#fig-embedded-plus-two-remote-ha-controllers] illustrates a single
embedded controller plus two remote controllers in an active-standby (&ie;
primary-backup) HA (High-Availability) configuration. Controller #1 is the
active controller and is in charge of some entities. If it fails, Controller #2
takes over and manages the tables formerly owned by Controller #1. The mechanics
of HA architectures are beyond the scope of this document, but the P4Runtime
role-based client arbitration scheme supports it.
~ Figure { #fig-embedded-plus-two-remote-ha-controllers; \
caption: "Use-Case: Embedded Plus Two Remote High-Availability Controllers" }
![embedded-plus-two-remote-ha-controllers]
~
[embedded-plus-two-remote-ha-controllers]: \
build/embedded-plus-two-remote-ha-controllers.[svg,png] \
{ height: 7cm; page-align: forcehere }
# Client Arbitration and Controller Replication {\
#sec-client-arbitration-and-controller-replication}
The P4Runtime interface allows multiple clients (&ie; controllers) to be
connected to the P4Runtime server running on the device at the same time for the
following reasons:
1. Partitioning of the control plane: Multiple controllers may have orthogonal,
non-overlapping, "roles" (or "realms") and should be able to push forwarding
entities simultaneously. The control plane can be partitioned into multiple
roles and each role will have a set of controllers, one of which is the
primary and the rest are backups. Role definition, &ie; how P4 entities get
assigned to each role, is **out-of-scope** of this document.
2. Redundancy and fault tolerance: Supporting multiple controllers allows having
one or more standby backup controllers. These can already have a connection
open, which can help them become primary more quickly, especially in the case
where the control-plane traffic is in-band and connection setup might be more
involved.
To support multiple controllers, P4Runtime uses the streaming channel (available
via `StreamChannel` RPC) for session management. The workflow is described as
follows:
* Each controller instance (⪚ a controller process) can participate in one or
more roles. For each (`device_id`, `role`), the controller receives an
`election_id`. This `election_id` can be the same for different roles and/or
devices, as long as the tuple (`device_id`, `role`, `election_id`) is
unique among live controllers, as defined below. For each (`device_id`,
`role`) that the controller wishes to control, it establishes a
`StreamChannel` with the P4Runtime server responsible for that device, and
sends a `MasterArbitrationUpdate` message containing that tuple of
(`device_id`, `role`, `election_id`) values. The P4Runtime server selects a
primary independently for each (`device_id`, `role`) pair. The primary is the
client that has the highest `election_id` that the device has ever received
for the same (`device_id`, `role`) values. A connection between a controller
instance and a device id --- which involves a persistent `StreamChannel` ---
can be referred to as a P4Runtime client.
Note that the P4Runtime server does not assign a `role` or `election_id` to
any controller. It is up to an arbitration mechanism outside of the server to
decide on the controller roles, and the `election_id` values used for each
`StreamChannel`. The P4Runtime server only keeps track of the (`device_id`,
`role`, `election_id`) of each `StreamChannel` that has sent a successful
`MasterArbitrationUpdate` message, and maintains the invariant that all such
3-tuples are unique among live controllers. A server must use all three of
these values from a `WriteRequest` message to identify which client is making
the `WriteRequest`, not only the `election_id`. This enables controllers to
re-use the same numeric `election_id` values across different (`device_id`,
`role`) pairs. P4Runtime does not require `election_id` values be reused
across such different (`device_id`, `role`) pairs; it allows it.
* To start a controller session, a controller first opens a bidirectional stream
channel to the server via the `StreamChannel` RPC for each device. This stream
will be used for two purposes:
* **Session management:** As soon as the controller opens the stream
channel, it sends a `StreamMessageRequest` message to the switch. The
controller populates the `MasterArbitrationUpdate` field in this message
using its `role` and `election_id`, as well as the `device_id` of the
device. Note that the `status` field in the `MasterArbitrationUpdate` is
not populated by the controller. This field is populated by the P4Runtime
server when it sends a response back to the client, as explained below.
* **Streaming of notifications (⪚ digests) and packet I/O:** The same
streaming channel will be used for streaming notifications, as well as for
packet-in and packet-out messages. Note that unless specified otherwise by
the role definitions, only the primary controller can participate in
packet I/O. This feature is explained in more details in the [Packet
I/O](#sec-packet-i_o) section.
Note that a controller session is only required if the controller wants to do
Packet I/O, or modify the forwarding state.
* Note that the stream is opened per device. In case a switching platform has
multiple devices (⪚ multi-ASIC line card) which are all controlled via the
same P4Runtime server, it is possible to have different primary clients for
different devices. In this case, it is the responsibility of the P4Runtime
server to keep track of the primary for each device (and role). More
specifically, the P4Runtime server will know which stream corresponds to the
primary controller for each pair of (`device_id`, `role`) at any point of
time.
* The streaming channel between the controller and the server defines the
liveness of the controller session. The controller is considered "offline",
"disconnected", or "dead" as soon as its stream channel to the switch is
broken. When a primary channel gets broken:
1. An advisory message is sent to all other controllers for that `device_id`
and `role`, as described in a
[later section](#sec-arbitration-notification); and
2. The P4Runtime server will be without a primary controller, until a client
sends a successful `MasterArbitrationUpdate` (as per the rules in a
[later section](#sec-arbitration-updates)).
* The mechanism through which the controller receives the P4Runtime server
details are implementation specific and beyond the scope of this
specification. This includes the `device_id`, `ip` and `port`, as
well as the Forwarding Pipeline Config. Similarly, the mechanism through
which the P4Runtime server receives its switch config (which notably includes
the `device_id`) is beyond the scope of this specification. Nevertheless, if
the server details or switch config are transferred via the network, it is
recommended to use TLS or similar encryption and authentication mechanisms to
prevent eavesdropping attacks.
gRPC enables the server to identify which client originated each message in the
`StreamChannel` stream. For example, the C++ gRPC library [@gRPCStreamC] in
synchronous mode enables a server process to cause a function to be called when
a new client creates a `StreamChannel` stream. This function should not return
until the stream is closed and the server has completed any cleanup required
when a `StreamChannel` is closed normally (or broken, ⪚ because a client
process unexpectedly terminated). Thus the server can easily associate all
`StreamChannel` messages received from the same client, because they are
processed within the context of the same function call.
A P4Runtime implementation need not rely on the gRPC library providing
information with unary RPC messages that identify which client they came from.
Unary RPC messages include requests to write table entries in the data plane, or
read state from the data plane, among others described later. P4Runtime relies
on clients identifying themselves in every write request, by including the
values `device_id`, `role`, and `election_id` in all write requests. The
server trusts clients not to use a triple of values other than their own in
their write requests. gRPC provides authentication methods [@gRPCAuth] that
should be deployed to prevent untrusted clients from creating channels, and thus
from making changes or even reading the state of the server.
## Default Role
A controller can omit the role message in `MasterArbitrationUpdate`. This
implies the "default role", which corresponds to "full pipeline access".
This also implies that a default role has a `role_id` of `""` (default).
If using a default role, all RPCs from the controller (⪚ `Write`) must
leave the `role` unset.
## Role Config { #sec-arbitration-role-config}
The `role.config` field in the `MasterArbitrationUpdate` message sent by the
controller describes the role configuration, &ie; which operations are in the
scope of a given role. In particular, the definition of a role may include the
following:
* A list of P4 entities for which the controller may issue `Write` updates and
receive notification messages (⪚ `DigestList` and
`IdleTimeoutNotification`).
* Whether the controller is able to receive `PacketIn` messages, along with a
filtering mechanism based on the values of the `PacketMetadata` fields to
select which `PacketIn` messages should be sent to the controller.
* Whether the controller is able to send `PacketOut` messages, along with a
filtering mechanism based on the values of the `PacketMetadata` fields to
select which `PacketOut` messages are allowed to be sent by the controller.
An unset `role.config` implies "full pipeline access" (similar to the default
role explained above). In order to support different role definition schemes,
`role.config` is defined as an `Any` Protobuf message [@ProtoAny]. Such schemes
are out-of-scope of this document. When partitioning of the control plane is
desired, the P4Runtime client(s) and server need to agree on a role definition
scheme in an out-of-band fashion.
It is the job of the P4Runtime server to remember the `role.config` for every
`device_id` and `role` pair.
## Rules for Handling `MasterArbitrationUpdate` Messages Received from Controllers { #sec-arbitration-updates }
1. If the `MasterArbitrationUpdate` message is received for the first time on
this particular channel (&ie; for a newly connected controller):
1. If `device_id` does not match any of the devices known to the P4Runtime
server, the server shall terminate the stream by returning a
`NOT_FOUND` error.
2. If the `election_id` is set and is already used by another live
controller for the same (`device_id`, `role`), the P4Runtime server shall
terminate the stream by returning an `INVALID_ARGUMENT` error.
3. If `role.config` does not match the "out-of-band" scheme previously
agreed upon, the server must return an `INVALID_ARGUMENT` error.
4. If the number of open streams for the given (`device_id`, `role`)
exceeds the supported limit, the P4Runtime server shall terminate the
stream by returning a `RESOURCE_EXHAUSTED` error.
5. Otherwise, the controller is added to a list of live controllers for
the given (`device_id`, `role`) and the server remembers the
controllers `device_id`, `role` and `election_id` for this gRPC
channel. See below for the rules to determine if this controller becomes
a primary or backup, and what notifications are sent as a consequence.
2. Otherwise, if the `MasterArbitrationUpdate` message is received from an
already live controller:
1. If the `device_id` does not match the one already assigned to this
stream, the P4Runtime server shall terminate the stream by returning a
`FAILED_PRECONDITION` error.
2. If the `role` does not match the current `role` assigned to this
stream, the P4Runtime server shall terminate the stream by returning a
`FAILED_PRECONDITION` error. If the controller wishes to change its role,
it must close the current stream channel and open a new one.
3. If `role.config` does not match the "out-of-band" scheme previously
agreed upon, the server must return an `INVALID_ARGUMENT` error.
4. If the `election_id` is set and is already used by another live
controller (excluding the controller making the request) for the same
(`device_id`, `role`), the P4Runtime server shall terminate the stream
by returning an `INVALID_ARGUMENT` error.
5. Otherwise, the server updates the `election_id` it has stored for this
controller. This change might cause a change in the primary client (this
controller might become primary, or the controller might have downgraded
itself to a backup, see below), as well as notifications being sent to
one or more controllers.
If the `MasterArbitrationUpdate` is accepted by either of the two steps above
(cases 1.5. and 2.5. above), then the server determines if there are changes in
the primary client. Let `election_id_past` be the highest election ID the server
has ever seen for the given `device_id` and `role` (including the one of the
current primary if there is one).
1. If `election_id` is greater than or equal to `election_id_past`, then the
controller becomes, or stays, primary. The server updates the role
configuration to `role.config` for the given `role`. Furthermore:
1. If there was no primary for this `device_id` and `role` before and
there are no `Write` requests still processing from a previous primary,
then the server immediately sends an advisory notification to all
controllers for this `device_id` and `role`. See the
[following section](#sec-arbitration-notification) for the format of the
advisory message.
2. If there was a previous primary, including this controller, or `Write`
requests in flight, then the server carries out the following steps
(in this order):
1. The server stops accepting `Write` requests from the previous primary
(if there is one). At this point, the server will reject all `Write`
requests with `PERMISSION_DENIED`.
2. The server notifies all controllers other than the new primary client
of the change by sending the advisory notification described in
the [following section](#sec-arbitration-notification).
3. The server will finish processing any `Write` requests that have
already started. If there are errors, they are reported as usual to
the previous primary. If the previous primary has already
disconnected, any possible errors are dropped and not reported.
4. The server now accepts the current controller as the new primary,
thus accepting `Write` requests from this controller. The server
updates the highest election ID (&ie; `election_id_past`) it has seen
for this `device_id` and `role` to `election_id`.
5. The server notifies the new primary by sending the advisory message
described in the [following section](#sec-arbitration-notification).
2. Otherwise, the controller becomes a backup. If the controller was previously
a primary (and downgraded itself), then an advisory message is sent to all
controllers for this `device_id` and `role`. Otherwise, the advisory
message is only sent to the controller that sent the initial
`MasterArbitrationUpdate`. See the
[following section](#sec-arbitration-notification) for the format of the
advisory message.
## Client Arbitration Notifications { #sec-arbitration-notification}
For any given `device_id` and `role`, any time a new primary is chosen, a
primary downgrades its status to a backup, a primary disconnects, or the
`role.config` is updated by the primary, all controllers for that
(`device_id`, `role`) are informed of this by sending a
`StreamMessageResponse`. The `MasterArbitrationUpdate` is populated as follows:
* `device_id` and `role` as given.
* `role.config` is set to the role configuration the server received most
recently in a `MasterArbitrationUpdate` from a primary.
* `election_id` is populated as follows:
* If there has not been any primary at all, the election_id is left unset.
* Otherwise, `election_id` is set to the highest election ID that the server
has seen for this `device_id` and `role` (which is the `election_id` of
the current primary if there is any).
* `status` is set differently based on whether the notification is sent to the
primary or a backup controller:
* If there is a primary:
* For the primary, `status` is OK (with `status.code` set to
`google.rpc.OK`).
* For all backup controllers, `status` is set to non-OK (with
`status.code` set to `google.rpc.ALREADY_EXISTS`).
* Otherwise, if there is no primary currently, for all backup controllers,
`status` is set to non-OK (with `status.code` set to
`google.rpc.NOT_FOUND`).
Note that on primary client changes with outstanding `Write` request, some
notifications might be delayed, see the
[previous section](#sec-arbitration-updates) for details.
# The P4Info Message
The purpose of P4Info was described under
[Reference Architecture](#sec-reference-architecture).
Here we describe the various
components.
## Common Messages
These messages appear nested within many other messages.
### `Documentation` Message
`Documentation` is used to carry both brief and long descriptions of something.
Good content within a documentation field is extremely helpful to P4Runtime
application developers.
~ Begin Proto
message Documentation {
// A brief description of something, e.g. one sentence
string brief = 1;
// A more verbose description of something.
// Multiline is accepted. Markup format (if any) is TBD.
string description = 2;
}
~ End Proto
### `Preamble` Message
The preamble serves as the "descriptor" for each entity and contains the unique
instance ID, name, alias, annotations and documentation.
~ Begin Proto
message Preamble {
// ids share the same number-space; e.g. table ids cannot overlap with counter
// ids. Even though this is irrelevant to this proto definition, the ids are
// allocated in such a way that it is possible based on an id to deduce the
// resource type (e.g. table, action, counter, ...). This means that code