You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+107-2
Original file line number
Diff line number
Diff line change
@@ -68,7 +68,7 @@ route-record option defined in [RFC0791] can be considered an in-band OAM mechan
68
68
This section discusses the motivation to introduce new options for
69
69
enhanced network diagnostics.
70
70
71
-
## Passive OAM
71
+
## Passive OAM
72
72
73
73
Mechanisms which add tracing information to the regular data traffic,
74
74
sometimes also referred to as "in-band" or "passive OAM" can
@@ -98,7 +98,7 @@ route-record option defined in [RFC0791] can be considered an in-band OAM mechan
98
98
probe traffic is handled differently (and potentially forwarded
99
99
differently) by a router than regular data traffic.
100
100
101
-
## Overlay and underlay correlation
101
+
## Overlay and underlay correlation
102
102
103
103
Several network deployments leverage tunneling mechanisms to create
104
104
overlay or service-layer networks. Examples include VXLAN, GRE, or
@@ -126,6 +126,111 @@ route-record option defined in [RFC0791] can be considered an in-band OAM mechan
126
126
mechanisms, to for example achieve path symmetry for the traffic
127
127
between two endpoints. [lisp-sr] is an example for how
128
128
these methods can be applied to LISP.
129
+
130
+
## Analytics and diagnostics
131
+
132
+
Network planners and operators benefit from knowledge of the actual traffic distribution in the network. Deriving an overall network connectivity traffic matrix one typically needs to correlate data gathered from each individual device in the network. If the path of a packet is recorded while the packet is forwarded, the entire path that a packet took through the network is available to the egress-system. This obviates the need to retrieve individual traffic statistics from every device in the network and correlate those statistics, or employ other mechanisms such as leveraging traffic-engineering with null-bandwidth tunnels just to retrieve the appropriate statistics to generate the traffic matrix.
133
+
134
+
In addition, with individual path recording, information is available at packet level granularity, rather than only at aggregate level - as is usually the case with IPFIX-style methods which employ flow- filters at the network elements. Data-center networks with heavy use of equal-cost multipath (ECMP) forwarding are one example where detailed statistics on flow distribution in the network are highly desired. If a network supports ECMP one can create detailed statistics for the different paths packets take through the network at the egress system, without a need to correlate/aggregate statistics from every router in the system. Transit devices are off-loaded from the task of gathering packet statistics.
135
+
136
+
## Proof of Transit
137
+
138
+
Several deployments use traffic engineering, policy routing, segment
139
+
routing or service function chaining (SFC) to steer packets through a
140
+
specific set of nodes. In certain cases regulatory obligations or a
141
+
compliance policy require to prove that all packets that are supposed
142
+
to follow a specific path are indeed being forwarded across the exact
143
+
set of nodes specified. I.e. if a packet flow is supposed to go
144
+
through a series of service functions or network nodes, it has to be
145
+
proven that all packets of the flow actually went through the service
146
+
chain or collection of nodes specified by the policy. In case the
147
+
packets of a flow weren't appropriately processed, a verification
148
+
device would be required to identify the policy violation and take
149
+
corresponding actions (e.g. drop or redirect the packet, send an
150
+
alert etc.) corresponding to the policy. In today's deployments, the
151
+
proof that a packet traversed a particular service chain is typically
152
+
delivered in an indirect way: Service appliances and network
153
+
forwarding are in different trust domains. Physical hand-off-points
154
+
are defined between these trust domains (i.e. physical interfaces).
155
+
Or in other terms, in the "network forwarding domain" things are
156
+
wired up in a way that traffic is delivered to the ingress interface
157
+
of a service appliance and received back from an egress interface of
158
+
a service appliance. This "wiring" is verified and trusted. The
159
+
evolution to Network Function Virtualization (NFV) and modern service
160
+
chaining concepts (using technologies such as LISP, NSH, Segment
161
+
Routing, etc.) blurs the line between the different trust domains,
162
+
because the hand-off-points are no longer clearly defined physical
163
+
interfaces, but are virtual interfaces. Because of that very reason,
164
+
networks operators require that different trust layers not to be
165
+
mixed in the same device. For an NFV scenario a different proof is
166
+
required. Offering a proof that a packet traversed a specific set of
167
+
service functions would allow network operators to move away from the
168
+
above described indirect methods of proving that a service chain is
169
+
in place for a particular application.
170
+
171
+
A solution approach is based on meta-data which is added to every
172
+
packet. The meta data is updated at every hop and is used to verify
173
+
whether a packet traversed all required nodes. A particular path is
174
+
either described by a set of secret keys, or a set of shares of a
175
+
single secret. Nodes on the path retrieve their individual keys or
176
+
shares of a key (using for e.g. Shamir's Shared Sharing Secret
177
+
scheme) from a central controller. The complete key set is only
178
+
known to the verifier - which is typically the ultimate node on a
179
+
path that requires verification. Each node in the path uses its
180
+
secret or share of the secret to update the meta-data of the packets
181
+
as the packets pass through the node. When the verifier receives a
182
+
packet, it can use its key(s) along with the meta-data to validate
183
+
whether the packet traversed the service chain correctly. The
184
+
detailed mechanisms used for path verification along with the
185
+
procedures applied to the meta-data carried in the packet for path
186
+
verification are beyond the scope of this document. Details will be
187
+
addressed in a separate document.
188
+
189
+
190
+
## Frame replication/elimination decision for bi-casting/active-active networks
191
+
192
+
Bandwidth- and power-constrained, time-sensitive, or loss-intolerant networks (e.g. networks for industry automation/control, health care) require efficient OAM methods to decide when to replicate packets to a secondary path in order to keep the loss/error-rate for the receiver at a tolerable level - and also when to stop replication and eliminate the redundant flow. Many IoT networks are time sensitive and cannot leverage automatic retransmission requests (ARQ) to cope with transmission errors or lost packets. Transmitting the data over multiple disparate paths (often called bi-casting or live-live) is a method used to reduce the error rate observed by the receiver. TSN receive a lot of attention from the manufacturing industry as shown by a various standardization activities and industry forums being formed (see e.g. IETF 6TiSCH, IEEE P802.1CB,AVnu).
193
+
194
+
## Example use-cases of iOAM6
195
+
<tableborder="3"align="left">
196
+
<tr>
197
+
<td><b>Use Case</b></td>
198
+
<td><b>Description</b></td>
199
+
</tr>
200
+
<tr>
201
+
<td>Traffic Matrix</td>
202
+
<td>Derive the network traffic matrix: Traffic for a given time interval between any two edge nodes of a given domain. Could be performed for all traffic or per QoS-class.</td>
203
+
</tr>
204
+
<tr>
205
+
<td>Flow Debugging </td>
206
+
<td>Discover which path(s) a particular set of traffic (identified by an n-tuple) takes in the network. Especially useful in case traffic is balanced across multiple paths, like with link aggregation (LACP) or equal cost multi-pathing (ECMP). </td>
207
+
</tr>
208
+
<tr>
209
+
<td>Loss statistics per path </td>
210
+
<td>Retrieve loss statistics per flow and path in the network</td>
211
+
</tr>
212
+
<tr>
213
+
<td>Path Heat Maps </td>
214
+
<td>Discover highly utilized links in the network </td>
215
+
</tr>
216
+
<tr>
217
+
<td>Trend analysis on traffic patterns </td>
218
+
<td>Analyze if (and if so how) the forwarding path for a specific set of traffic changes over time (can give hints to routing issues, instable links etc.)</td>
219
+
</tr>
220
+
<tr>
221
+
<td>Network delay distribution </td>
222
+
<td>Show delay distribution across network by node or links. If enabled per application or a specific flow then display the path taken with delay at each node. </td>
223
+
</tr>
224
+
<tr>
225
+
<td>Low-Power networks</td>
226
+
<td>Include application level OAM information (e.g. battery charge level) into data traffic to avoid sending extra OAM traffic which incur an extra cost on the devices. Using the battery charge level as example, we could avoid sending extra OAM packets just to communicate battery health, and as such would save battery on sensors.</td>
227
+
</tr>
228
+
<tr>
229
+
<td>Path verification or service chain verification</td>
230
+
<td>Proof and verification of packets traversing check points in the network, where check points can be nodes in the network or service functions. </td>
0 commit comments