@@ -158,22 +158,31 @@ init process will block waiting for the parent to finish setup.
158
158
### IntelRdt
159
159
160
160
Intel platforms with new Xeon CPU support Resource Director Technology (RDT).
161
- Cache Allocation Technology (CAT) and Memory Bandwidth Allocation (MBA) are
162
- two sub-features of RDT.
161
+ Cache Allocation Technology (CAT), Cache Monitoring Technology (CMT),
162
+ Memory Bandwidth Allocation (MBA) and Memory Bandwidth Monitoring (MBM) are
163
+ four sub-features of RDT.
163
164
164
165
Cache Allocation Technology (CAT) provides a way for the software to restrict
165
166
cache allocation to a defined 'subset' of L3 cache which may be overlapping
166
167
with other 'subsets'. The different subsets are identified by class of
167
168
service (CLOS) and each CLOS has a capacity bitmask (CBM).
168
169
170
+ Cache Monitoring Technology (CMT) supports monitoring of the last-level cache (LLC) occupancy
171
+ for each running thread simultaneously.
172
+
169
173
Memory Bandwidth Allocation (MBA) provides indirect and approximate throttle
170
174
over memory bandwidth for the software. A user controls the resource by
171
- indicating the percentage of maximum memory bandwidth or memory bandwidth limit
172
- in MBps unit if MBA Software Controller is enabled.
175
+ indicating the percentage of maximum memory bandwidth or memory bandwidth
176
+ limit in MBps unit if MBA Software Controller is enabled.
177
+
178
+ Memory Bandwidth Monitoring (MBM) supports monitoring of total and local memory bandwidth
179
+ for each running thread simultaneously.
173
180
174
- It can be used to handle L3 cache and memory bandwidth resources allocation
175
- for containers if hardware and kernel support Intel RDT CAT and MBA features.
181
+ More details about Intel RDT CAT and MBA can be found in the section 17.18 and 17.19, Volume 3
182
+ of Intel Software Developer Manual:
183
+ https://software.intel.com/en-us/articles/intel-sdm
176
184
185
+ About Intel RDT kernel interface:
177
186
In Linux 4.10 kernel or newer, the interface is defined and exposed via
178
187
"resource control" filesystem, which is a "cgroup-like" interface.
179
188
@@ -194,22 +203,43 @@ tree /sys/fs/resctrl
194
203
| | |-- cbm_mask
195
204
| | |-- min_cbm_bits
196
205
| | |-- num_closids
206
+ | |-- L3_MON
207
+ | | |-- max_threshold_occupancy
208
+ | | |-- mon_features
209
+ | | |-- num_rmids
197
210
| |-- MB
198
211
| |-- bandwidth_gran
199
212
| |-- delay_linear
200
213
| |-- min_bandwidth
201
214
| |-- num_closids
202
- |-- ...
215
+ |-- mon_groups
216
+ |-- <container_id>
217
+ |-- ...
218
+ |-- mon_data
219
+ |-- mon_L3_00
220
+ |-- llc_occupancy
221
+ |-- mbm_local_bytes
222
+ |-- mbm_total_bytes
223
+ |-- ...
224
+ |-- tasks
203
225
|-- schemata
204
226
|-- tasks
205
227
|-- <container_id>
206
228
|-- ...
207
- |-- schemata
229
+ |-- mon_data
230
+ |-- mon_L3_00
231
+ |-- llc_occupancy
232
+ |-- mbm_local_bytes
233
+ |-- mbm_total_bytes
234
+ |-- ...
208
235
|-- tasks
236
+ |-- schemata
237
+ |-- ...
209
238
```
210
239
211
240
For runc, we can make use of ` tasks ` and ` schemata ` configuration for L3
212
- cache and memory bandwidth resources constraints.
241
+ cache and memory bandwidth resources constraints, ` mon_data ` directory for
242
+ CMT and MBM statistics.
213
243
214
244
The file ` tasks ` has a list of tasks that belongs to this group (e.g.,
215
245
<container_id>" group). Tasks can be added to a group by writing the task ID
@@ -251,7 +281,7 @@ that is allocated is also dependent on the CPU model and can be looked up at
251
281
min_bw + N * bw_gran. Intermediate values are rounded to the next control
252
282
step available on the hardware.
253
283
254
- If MBA Software Controller is enabled through mount option "-o mba_MBps"
284
+ If MBA Software Controller is enabled through mount option "-o mba_MBps":
255
285
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
256
286
We could specify memory bandwidth in "MBps" (Mega Bytes per second) unit
257
287
instead of "percentages". The kernel underneath would use a software feedback
@@ -263,11 +293,12 @@ For example, on a two-socket machine, the schema line could be
263
293
"MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on socket 0
264
294
and 7000 MBps memory bandwidth limit on socket 1.
265
295
266
- For more information about Intel RDT kernel interface:
296
+ For more information about Intel RDT kernel interface:
267
297
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
268
298
269
- ```
299
+
270
300
An example for runc:
301
+ ```
271
302
Consider a two-socket machine with two L3 caches where the default CBM is
272
303
0x7ff and the max CBM length is 11 bits, and minimum memory bandwidth of 10%
273
304
with a memory bandwidth granularity of 10%.
@@ -278,10 +309,18 @@ maximum memory bandwidth of 20% on socket 0 and 70% on socket 1.
278
309
279
310
"linux": {
280
311
"intelRdt": {
281
- "closID": "guaranteed_group",
282
312
"l3CacheSchema": "L3:0=7f0;1=1f",
283
313
"memBwSchema": "MB:0=20;1=70"
284
- }
314
+ }
315
+ }
316
+ ```
317
+ Another example:
318
+ ```
319
+ We only want to monitor memory bandwidth and llc occupancy.
320
+ "linux": {
321
+ "intelRdt": {
322
+ "monitoring": true
323
+ }
285
324
}
286
325
```
287
326
0 commit comments