Skip to content

Commit cda96b5

Browse files
tonyhbjoaofnfernandes
authored andcommitted
Add GC guide to DTR (docker#1767)
* Add GC guide to DTR * Add border to images
1 parent c35a42a commit cda96b5

File tree

5 files changed

+103
-0
lines changed

5 files changed

+103
-0
lines changed

_data/toc.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1323,6 +1323,8 @@ manuals:
13231323
title: Set up vulnerability scans
13241324
- path: /datacenter/dtr/2.2/guides/admin/configure/deploy-a-cache/
13251325
title: Deploy a cache
1326+
- path: /datacenter/dtr/2.2/guides/admin/configure/garbage-collection/
1327+
title: Garbage collection
13261328
- sectiontitle: Manage users
13271329
section:
13281330
- path: /datacenter/dtr/2.2/guides/admin/manage-users/
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
---
2+
description: Configure garbage collection in Docker Trusted Registry
3+
keyworkds: docker, registry, garbage collection, gc, space, disk space
4+
title: Docker Trusted Registry 2.2 Garbage Collection
5+
---
6+
7+
#### TL;DR
8+
9+
1. Garbage Collection (GC) reclaims disk space from your storage by deleting
10+
unused layers
11+
2. GC can be configured to run automatically with a cron schedule, and can also
12+
be run manually. Only admins can configure these
13+
3. When GC runs DTR will be placed in read-only mode. Pulls will work but
14+
pushes will fail
15+
4. The UI will show when GC is running, and an admin can stop GC within the UI
16+
17+
**Important notes**
18+
19+
The GC cron schedule is set to run in **UTC time**. Containers typically run in
20+
UTC time (unless the system time is mounted), therefore remember that the cron
21+
schedule will run based off of UTC time when configuring.
22+
23+
GC puts DTR into read only mode; pulls succeed while pushes fail. Pushing an
24+
image while GC runs may lead to undefined behaviour and data loss, therefore
25+
this is disabled for safety. For this reason it's generally best practice to
26+
ensure GC runs in the early morning on a Saturday or Sunday night.
27+
28+
29+
## Setting up garbage collection
30+
31+
You can set up GC if you're an admin by hitting "Settings" in the UI then
32+
choosing "Garbage Collection". By default, GC will be disabled, showing this
33+
screen:
34+
35+
![](../../images/garbage-collection-1.png){: .with-border}
36+
37+
Here you can configure GC to run **until it's done** or **with a timeout**.
38+
The timeout ensures that your registry will be in read-only mode for a maximum
39+
amount of time.
40+
41+
Select an option (either "Until done" or "For N minutes") and you'll have the
42+
option to configure GC to run via a cron job, with several default crons
43+
provided:
44+
45+
![](../../images/garbage-collection-2.png){: .with-border}
46+
47+
You can also choose "Do not repeat" to disable the cron schedule entirely.
48+
49+
Once the cron schedule has been configured (or disabled), you have the option to
50+
the schedule ("Save") or save the schedule *and* start GC immediately ("Save
51+
& Start").
52+
53+
## Stopping GC while it's running
54+
55+
When GC runs the garbage collection settings page looks as follows:
56+
57+
![](../../images/garbage-collection-3.png){: .with-border}
58+
59+
Note the global banner visible to all users, ensuring everyone knows that GC is
60+
running.
61+
62+
An admin can stop the current GC process by hitting "Stop". This safely shuts
63+
down the running GC job and moves the registry into read-write mode, ensuring
64+
pushes work as expected.
65+
66+
## How does garbage collection work?
67+
68+
### Background: how images are stored
69+
70+
Each image stored in DTR is made up of multiple files:
71+
72+
- A list of "layers", which represent the image's filesystem
73+
- The "config" file, which dictates the OS, architecture and other image
74+
metadata
75+
- The "manifest", which is pulled first and lists all layers and the config file
76+
for the image.
77+
78+
All of these files are stored in a content-addressible manner. We take the
79+
sha256 hash of the file's content and use the hash as the filename. This means
80+
that if tag `example.com/user/blog:1.11.0` and `example.com/user/blog:latest`
81+
use the same layers we only store them once.
82+
83+
### How this impacts GC
84+
85+
Let's continue from the above example, where `example.com/user/blog:latest` and
86+
`example.com/user/blog:1.11.0` point to the same image and use the same layers.
87+
If we delete `example.com/user/blog:latest` but *not*
88+
`example.com/user/blog:1.11.0` we expect that `example.com/user/blog:1.11.0`
89+
can still be pulled.
90+
91+
This means that we can't delete layers when tags or manifests are deleted.
92+
Instead, we need to pause writing and take reference counts to see how many
93+
times a file is used. If the file is never used only then is it safe to delete.
94+
95+
This is the basis of our "mark and sweep" collection:
96+
97+
1. Iterate over all manifests in registry and record all files that are
98+
referenced
99+
2. Iterate over all file stored and check if the file is referenced by any
100+
manifest
101+
3. If the file is *not* referenced, delete it
Loading
Loading
Loading

0 commit comments

Comments
 (0)