-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathccp.tex
319 lines (288 loc) · 17.3 KB
/
ccp.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
\section{Writing Algorithms in CCP}
%
\begin{figure}[t]
\centering
\includegraphics[width=\columnwidth]{img/ccp_design_sigcomm}
\caption{Congestion control algorithms in CCP are distinct from the application and datapath.
Users specify an \texttt{onCreate()} handler which CCP calls when a new flow begins.
In this handler, algorithms install (1) a datapath program.
This datapath program aggregates incoming measurements (2) using user-defined fold functions and occasionally sends reports (3) to CCP, which calls the \texttt{onReport()} handler.
The \texttt{onReport()} handler can update (4) the datapath program, which uses its defined control patterns to enforce (5) a congestion window or pacing rate.
}\label{fig:design}
\end{figure}
%
\label{sec:ccp}
\begin{table}
\centering
\footnotesize
\begin{tabular}{@{} p{0.35\columnwidth}p{0.5\columnwidth}}
\hline
\hline
\multicolumn{2}{c}{Primitive congestion signals} \\
\hline
\hline
\textbf{Signal} & \textbf{Definition} \\
\texttt{Ack.bytes\_acked}, \texttt{Ack.packets\_acked} & In-order acknowledged \\
\texttt{Ack.bytes\_misordered}, \texttt{Ack.packets\_misordered} & Out-of-order acknowledged \\
\texttt{Ack.ecn\_bytes}, \texttt{Ack.ecn\_packets} & ECN-marked \\
\texttt{Ack.lost\_pkts\_sample} & Number of lost packets \\
\texttt{Ack.now} & Datapath time (e.g., Linux \texttt{jiffies})\\
\texttt{Flow.was\_timeout} & Did a timeout occur? \\
\texttt{Flow.rtt\_sample\_us} & A recent sample RTT \\
\texttt{Flow.rate\_outgoing} & Outgoing sending rate \\
\texttt{Flow.rate\_incoming} & Receiver-side receiving rate \\
\texttt{Flow.bytes\_in\_flight}, \texttt{Flow.packets\_in\_flight} & Sent but not yet acknowledged \\
& \\
\hline
\hline
\multicolumn{2}{c}{Operators} \\
\hline
\hline
\textbf{Class} & \textbf{Operations} \\
Arithmetic & \texttt{+}, \texttt{-}, \texttt{*}, \texttt{~/} \\
Assignment & \texttt{:=} \\
Comparison & \texttt{==}, \texttt{<}, \texttt{>}, \texttt{or}, \texttt{and} \\
Conditionals & \texttt{If} (branching) \\
& \\
\hline
\hline
\multicolumn{2}{c}{Variable Scopes} \\
\hline
\hline
\textbf{Scope} & \textbf{Description} \\
\texttt{Ack} & Signals measured per packet \\
\texttt{Flow} & Signals measured per connection \\
%% \texttt{Control} & Never reset except by \texttt{update\_fields()} in user-space \\
%% \texttt{Report} & Maintained until a call to \texttt{(report)}, then reset to defaults \\
\texttt{Timer} & Multi-resolution timer that can be zeroed by a call to \ct{reset} \\
\end{tabular}
%\vspace{0.075in}
\caption{Datapath language: congestion signals, operators, and scopes.}\label{tab:api}
\end{table}
Figure~\ref{fig:design} shows the control loop of a congestion control algorithm in CCP.
Users implement two callback handlers (\ct{onCreate()} and \ct{onReport()}) in the CCP agent
and one or more datapath programs.
When a new flow is created, CCP's datapath component invokes the \ct{onCreate()} handler.
The implementation of \ct{onCreate()} must install an initial datapath program for that flow.
%
Datapath programs could compute summaries over per-packet
congestion signals (such as a minimum packet delay or a moving average of packet
delivery rate) and report summaries or high priority conditions
(such as loss) to the CCP agent.
On a report, the CCP agent invokes the \ct{onReport()} handler
which contains the bulk of the logic of the congestion control algorithm.
The \ct{onReport()} function computes and installs the flow's congestion
window or sending rate using the signals from the datapath report.
It may also replace the datapath program entirely with different logic.
\subsection{Datapath Program Abstractions}
\label{s:ccp:datapath_programs}
CCP's datapath programs are written in a simple domain specific language.
These programs exist in order to provide a per ACK execution environment, where algorithms can define and update variables per ACK and perform \textit{control actions}, in response to the values of these variables.
Figure~\ref{lst:design:simple_datapath_prog} shows a program that counts the cumulative number of packets acknowledged and lost and reports these counters immediately upon a loss.
\begin{figure}[t]
{\footnotesize
\begin{minted}[xleftmargin=\parindent,linenos]{lisp}
(def (Report (volatile acked 0) (volatile lost 0)))
(when true
(:= Report.acked (+ Report.acked Ack.bytes_acked))
(:= Report.lost (+ Report.lost Ack.lost_pkts_sample))
(fallthrough))
(when (> Report.lost 0) (report))
\end{minted}
\caption{A simple datapath program to count bytes acked and report on losses.} \label{lst:design:simple_datapath_prog}
}
\end{figure}
The first statement of the program allows users to define custom variables.
The ``Report'' block signifies that these variables should be included in the report message sent to the CCP agent.
The \ct{volatile} marker means that these variables should be reset to their initial values, 0, after every report to the CCP agent.
Following the \texttt{def} block, \textit{fold functions} provide custom summaries over primitive congestion signals.
Datapath programs have read access to these \textit{primitive congestion signals} (prefixed with ``\ct{Ack.}'' or ``\ct{Flow.}'' to specify their measurement period), which are exposed by the datapath on every incoming packet. Such signals include the round trip delay sample, the number of bytes the datapath believes have been dropped by the network, and the delivery rates of packets. Table ~\ref{tab:api} enumerates the primitive congestion signals we support.
Users can write simple mathematical summaries over these primitive signals, as shown in Lines 3-4 of Figure~\ref{lst:design:simple_datapath_prog}.
Finally, algorithms can perform control actions in response to conditions defined by the fold function variables, \eg updating a rate or cwnd or reporting the user defined variables to the CCP agent.
As shown in Figure~\ref{lst:design:simple_datapath_prog}, the program defines a series of \ct{when} clauses, and performs the following block only if the condition was evaluated to true.
CCP's datapath program language provides an event driven programming model.
The condition \ct{(when true...)} signifies that the body should be evaluated on every packet.
This is where the program might calculate fold function summaries.
The \ct{when} clauses have access to all the fold function variables, as well as timing related counters.
The \ct{report} instruction causes the datapath to transmit the \ct{acked} and \ct{lost} counters to the CCP agent.
By default, the program evaluates until one \ct{when} clause evaluates to true; the \ct{(fallthrough)} instruction at the end of the first \ct{when} indicates that subsequent \ct{when} clauses should also be evaluated.
% CCP's datapath programs are written in a restrictive
% domain-specific language.
% Conceptually, datapath programs contain state initializations, \ie a \ct{def}
% clause, and event specifications, \ie a series of \ct{when} clauses.
% The datapath checks event conditions on every incoming packet and timeout event.
% If an event condition evaluates to true, the body of the \ct{when} clause is executed as the corresponding
% action.
% For example, the following program counts the cumulative
% number of packets acknowledged and lost, and it reports these counters immediately upon a loss:
% {\footnotesize
% \begin{minted}
% {lisp}
% (def (Report (volatile acked 0) (volatile lost 0)))
% (when true
% (:= Report.acked (+ Report.acked Ack.bytes_acked))
% (:= Report.lost (+ Report.lost Ack.lost_pkts_sample))
% (fallthrough))
% (when (> Report.lost 0) (report))
% \end{minted}
% }
% Here, \texttt{(when true ...)} signifies that the body should be evaluated on every packet.
% Correspondingly, \texttt{(fallthrough)} indicates that subsequent \texttt{when} clauses should be evaluated.
% The \ct{report} instruction causes the datapath to
% transmit the \ct{acked} and \ct{lost} byte counters to the slow path whenever
% the \ct{lost} counter becomes greater than zero.
% Since \ct{acked} and \ct{lost} are marked \ct{volatile}, these values are reset to their initial value, 0,
% after every call to \ct{report}.
% Datapath programs have read access to {\em primitive congestion signals}
% (prefixed with ``\ct{Ack.}'' or ``\ct{Flow.}'' to specify their measurement period) that are
% exposed by the datapath on every incoming packet.
% Such signals include statistics such as the round trip delay sample obtained
% from the packet, the number of bytes that the stack believes have been dropped
% by the network, the delivery rate of packets at the receiver, and so on.
% Table~\ref{tab:api} enumerates the primitive congestion signals we support.
\subsection{CCP Algorithm Logic}
\label{s:ccp:ccp_algorithm_logic}
The \ct{onReport()} handler provides a way to
implement congestion control actions in \userspace in reaction to reports from the datapath.
For example, a simple additive-increase
multiplicative-decrease (AIMD) algorithm could be implemented in Python\footnote{Our CCP implementation is in Rust and exposes Python bindings (\S\ref{s:datapath}).} using
the \ct{acked} and \ct{lost} bytes reported every round-trip time from the datapath:
{\footnotesize
\begin{minted}{python}
def onReport(self, report):
if report["lost"] > 0:
self.cwnd = self.cwnd / 2
else:
acked = report["acked"]
self.cwnd = self.cwnd + acked*MSS/self.cwnd
self.update("cwnd", self.cwnd/MSS)
\end{minted}
}
We have implemented complex functionality within congestion control algorithms
by leveraging slow-path logic, for example, a congestion control
algorithm that uses Fast Fourier Transform (FFT) operations~\cite{nimbus}.
If the round-trip time of the network is a few milliseconds or more,
it is possible to locate congestion control algorithm logic entirely within CCP
with high fidelity relative to a per-packet update algorithm, as we show in \S\ref{sec:eval:fidelity}.
\subsection{Example: BBR}
\label{s:ccp:bbr}
As a more involved example, we show below how various components of TCP BBR~\cite{bbr} are implemented using the CCP API.
A BBR sender estimates the rate of packets delivered to the receiver, and
sets its sending rate to the maximum delivered rate (over a sliding time window),
which is believed to be the rate of the bottleneck link between the sender and the receiver.
This filter over the received rate is expressed simply in a fold function:
{\footnotesize
\begin{minted}{lisp}
(when true
(:= minrtt (min minrtt Ack.rtt_sample_us))
(:= curr_btl_est (max curr_btl_est Flow.rate_incoming))
(fallthrough))
\end{minted}
}
To determine whether a connection can send more than its current sending rate,
BBR probes for additional available bandwidth by temporarily increasing its
sending rate by a factor (1.25$\times$) of its current sending rate.
%
To drain a queue that may have been created in the process, it also reduces its
rate by a reciprocal factor (0.75$\times$) before starting to send at the new estimated
bottleneck link rate.
The following excerpt expresses this sending pattern (for simplicity, we show only 2 transitions):
{\footnotesize
\begin{minted}{lisp}
(when (== pulseState 0)
(:= Rate (* 1.25 curr_btl_est))
(:= pulseState 1))
(when (&& (== pulseState 1)
(> Timer.micros Flow.rtt_sample_us))
(:= Rate (* 0.75 curr_btl_est))
(:= pulseState 2))
\end{minted}
}
Here, the variable \ct{pulseState} denotes the state of the sender's
bandwidth probing: probing with high sending rate (0) and draining queues with low
sending rate (1).
Each \texttt{when} clause represents a pulse state transition and is conditioned on the
resettable timer \texttt{Timer.micros}.
Upon the transition, the handler sets the \texttt{Rate} and advances \texttt{pulseState}.
After the last phase of the pulse, the handler would reset the timer and \ct{pulseState} to restart the sending pattern (not shown).
%\ma{is this too complicated to show? it would be nice to show a complete example.}
Figure~\ref{fig:ccp:bbr} shows
the impact of BBR's bandwidth probing\footnote{We only implement BBR's PROBE\_BW and PROBE\_RTT. Our implementation is here: \url{github.com/ccp-project/bbr}.} on the
achieved goodput and queueing delays when a single flow runs over a 48 Mbit/s
bottleneck link with a 20 ms round trip propagation delay.
%
BBR's windowed min/max operations and the RTT probing phase (showing steep rate
dips every 10 seconds) are implemented in the slow path's \ct{onReport()}
handler by installing a new fold function.
%
CCP's split programming model enables this flexible partitioning of functionality.
\begin{figure}[t]
\centering
\includegraphics[width=\columnwidth]{img/bbr}
\caption{
Our CCP implementation of BBR used for a bulk transfer over a 48 Mbit/s link with a 20 ms RTT and 2 BDPs of buffering. The bandwidth probe phase can be seen in the oscillation of the queueing delay, and the RTT probe phase can be seen in the periodic dips in throughput.
}\label{fig:ccp:bbr}
\end{figure}
\subsection{Case Study: Slow Start}
\label{s:ccp:ss}
Because algorithms no longer make decisions upon every ACK, CCP changes the way in which developers should think about congestion control, and correspondingly provides multiple implementation choices. As a result, new issues arise about where to place algorithm functionality. We discuss the involved trade-offs with an illustrative example: slow start.
Slow start is a widely used congestion control module in which a connection probes for bandwidth by multiplicatively increasing its congestion window (\texttt{cwnd}) every RTT. Most implementations increment \texttt{cwnd} per ACK, either by the number of bytes acknowledged in the ACK, or by 1 MSS. One way to implement slow start is to retain the logic entirely in CCP, and measure the size of the required window update from datapath reports. We show an example in Figure~\ref{lst:ccp:ss}. This implementation strategy is semantically closest in behavior to native datapath implementations.
\begin{figure}[t]
{\footnotesize
\begin{minted}{rust}
fn create(...) {
datapath.install("
(def (Report (volatile acked 0) (volatile loss 0)))
(when true
(:= Report.acked (+ Report.acked Ack.bytes_acked)))
(when (> Micros Flow.rtt_sample_us) (report) (reset))");
}
fn onReport(...) {
if report.get_field("Report.loss") == 0 {
let acked = report.get_field("Report.acked");
self.cwnd += acked;
datapath.update_field(&[("Cwnd", self.cwnd)]);
} else { /* exit slow start */ }
}
\end{minted}
}
\vspace{-10pt}
\caption{A CCP implementation of slow start.} \label{lst:ccp:ss}
\vspace{-15pt}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=\columnwidth]{img/ss-evo}
\caption{Different implementations of slow start have different window update characteristics. The control pattern implementation is rate-based, so we show the congestion window corresponding to the achieved throughput over each RTT.}
\label{fig:ccp:ss}
\end{figure}
For some workloads this approach may prove problematic, depending on the parameters of the algorithm. If the reporting period defined is large, then infrequent slow start updates can cause connections to lose throughput.
Figure~\ref{fig:ccp:ss} demonstrates that, on a $48$ Mbps, $100$ ms RTT link, different implementations of slow start exhibit differing window updates relative to the Linux kernel baseline.
A version with a 1-RTT reporting period lags behind the native datapath implementation. It is also possible to implement slow start within the datapath either by using congestion window increase (Figure~\ref{lst:ccp:ssfold}), or by using rate based control:
{\footnotesize
\begin{minted}{lisp}
(when (> Timer.Micros Flow.rtt_sample_us)
(:= Rate (* Rate 2))
(:= Timer.Micros 0))
\end{minted}
}
\paragrapha{Take-away} As outlined in \S\ref{s:design}, the programming model of datapath programs is deliberately limited.
First, we envision that in the future, CCP will support low-level hardware datapaths---the simpler the fold function execution environment is, the easier these hardware implementations will be. Second, algorithms able to make complex decisions on longer time-scales will naturally do so to preserve cycles for the application and datapath; as a result, complex logic inside the fold function may not be desirable.
More broadly, developers may choose among various points in the algorithm design space.
On one extreme, algorithms may be implemented almost entirely in CCP, using the fold function as a simple measurement query language.
On the other extreme, CCP algorithms may merely specify transitions between in-datapath fold functions implementing the primary control logic of the algorithm.
Ultimately, users are able to choose the algorithm implementation best suited to their congestion control logic and application needs.
\begin{figure}[t]
{\footnotesize
\begin{minted}{rust}
fn create(...) {
datapath.install("
(def (volatile Report.loss 0))
(when true (:= Cwnd (+ Cwnd Ack.bytes_acked)))
(when (> Ack.lost_pkts_sample 0) (report))");
}
fn onReport(...) { /* exit slow start */ }
\end{minted}
}
\caption{A within-fold implementation of slow start. Note that CCP algorithm code is not invoked at all until the connection experiences its first loss.} \label{lst:ccp:ssfold}
\end{figure}