|
| 1 | +--- |
| 2 | +description: Configure garbage collection in Docker Trusted Registry |
| 3 | +keyworkds: docker, registry, garbage collection, gc, space, disk space |
| 4 | +title: Docker Trusted Registry 2.2 Garbage Collection |
| 5 | +--- |
| 6 | + |
| 7 | +#### TL;DR |
| 8 | + |
| 9 | +1. Garbage Collection (GC) reclaims disk space from your storage by deleting |
| 10 | +unused layers |
| 11 | +2. GC can be configured to run automatically with a cron schedule, and can also |
| 12 | +be run manually. Only admins can configure these |
| 13 | +3. When GC runs DTR will be placed in read-only mode. Pulls will work but |
| 14 | +pushes will fail |
| 15 | +4. The UI will show when GC is running, and an admin can stop GC within the UI |
| 16 | + |
| 17 | +**Important notes** |
| 18 | + |
| 19 | +The GC cron schedule is set to run in **UTC time**. Containers typically run in |
| 20 | +UTC time (unless the system time is mounted), therefore remember that the cron |
| 21 | +schedule will run based off of UTC time when configuring. |
| 22 | + |
| 23 | +GC puts DTR into read only mode; pulls succeed while pushes fail. Pushing an |
| 24 | +image while GC runs may lead to undefined behaviour and data loss, therefore |
| 25 | +this is disabled for safety. For this reason it's generally best practice to |
| 26 | +ensure GC runs in the early morning on a Saturday or Sunday night. |
| 27 | + |
| 28 | + |
| 29 | +## Setting up garbage collection |
| 30 | + |
| 31 | +You can set up GC if you're an admin by hitting "Settings" in the UI then |
| 32 | +choosing "Garbage Collection". By default, GC will be disabled, showing this |
| 33 | +screen: |
| 34 | + |
| 35 | +{: .with-border} |
| 36 | + |
| 37 | +Here you can configure GC to run **until it's done** or **with a timeout**. |
| 38 | +The timeout ensures that your registry will be in read-only mode for a maximum |
| 39 | +amount of time. |
| 40 | + |
| 41 | +Select an option (either "Until done" or "For N minutes") and you'll have the |
| 42 | +option to configure GC to run via a cron job, with several default crons |
| 43 | +provided: |
| 44 | + |
| 45 | +{: .with-border} |
| 46 | + |
| 47 | +You can also choose "Do not repeat" to disable the cron schedule entirely. |
| 48 | + |
| 49 | +Once the cron schedule has been configured (or disabled), you have the option to |
| 50 | +the schedule ("Save") or save the schedule *and* start GC immediately ("Save |
| 51 | +& Start"). |
| 52 | + |
| 53 | +## Stopping GC while it's running |
| 54 | + |
| 55 | +When GC runs the garbage collection settings page looks as follows: |
| 56 | + |
| 57 | +{: .with-border} |
| 58 | + |
| 59 | +Note the global banner visible to all users, ensuring everyone knows that GC is |
| 60 | +running. |
| 61 | + |
| 62 | +An admin can stop the current GC process by hitting "Stop". This safely shuts |
| 63 | +down the running GC job and moves the registry into read-write mode, ensuring |
| 64 | +pushes work as expected. |
| 65 | + |
| 66 | +## How does garbage collection work? |
| 67 | + |
| 68 | +### Background: how images are stored |
| 69 | + |
| 70 | +Each image stored in DTR is made up of multiple files: |
| 71 | + |
| 72 | +- A list of "layers", which represent the image's filesystem |
| 73 | +- The "config" file, which dictates the OS, architecture and other image |
| 74 | +metadata |
| 75 | +- The "manifest", which is pulled first and lists all layers and the config file |
| 76 | +for the image. |
| 77 | + |
| 78 | +All of these files are stored in a content-addressible manner. We take the |
| 79 | +sha256 hash of the file's content and use the hash as the filename. This means |
| 80 | +that if tag `example.com/user/blog:1.11.0` and `example.com/user/blog:latest` |
| 81 | +use the same layers we only store them once. |
| 82 | + |
| 83 | +### How this impacts GC |
| 84 | + |
| 85 | +Let's continue from the above example, where `example.com/user/blog:latest` and |
| 86 | +`example.com/user/blog:1.11.0` point to the same image and use the same layers. |
| 87 | +If we delete `example.com/user/blog:latest` but *not* |
| 88 | +`example.com/user/blog:1.11.0` we expect that `example.com/user/blog:1.11.0` |
| 89 | +can still be pulled. |
| 90 | + |
| 91 | +This means that we can't delete layers when tags or manifests are deleted. |
| 92 | +Instead, we need to pause writing and take reference counts to see how many |
| 93 | +times a file is used. If the file is never used only then is it safe to delete. |
| 94 | + |
| 95 | +This is the basis of our "mark and sweep" collection: |
| 96 | + |
| 97 | +1. Iterate over all manifests in registry and record all files that are |
| 98 | +referenced |
| 99 | +2. Iterate over all file stored and check if the file is referenced by any |
| 100 | +manifest |
| 101 | +3. If the file is *not* referenced, delete it |
0 commit comments