Skip to content

Commit c3459ae

Browse files
pks-tgitster
authored andcommittedSep 4, 2024
refs/files: use heuristic to decide whether to repack with --auto
The `--auto` flag for git-pack-refs(1) allows the ref backend to decide whether or not a repack is in order. This switch has been introduced mostly with the "reftable" backend in mind, which already knows to auto-compact its tables during normal operations. When the flag is set, then it will use the same auto-compaction mechanism and thus end up doing nothing in most cases. The "files" backend does not have any such heuristic yet and instead packs any loose references unconditionally. So we rewrite the complete "packed-refs" file even if there's only a single loose reference to be packed. Even worse, starting with 9f6714a (builtin/gc: pack refs when using `git maintenance run --auto`, 2024-03-25), `git pack-refs --auto` is unconditionally executed via our auto maintenance, so we end up repacking references every single time auto maintenance kicks in. And while that commit already mentioned that the "files" backend unconditionally packs refs now, the author obviously didn't quite think about the consequences thereof. So while the idea was sound, we really should have added a heuristic to the "files" backend before implementing it. Introduce a heuristic that decides whether or not it is worth to pack loose references. The important factors to decide here are the number of loose references in comparison to the overall size of the "packed-refs" file. The bigger the "packed-refs" file, the longer it takes to rewrite it and thus we scale up the limit of allowed loose references before we repack. As is the nature of heuristics, this mechansim isn't obviously "correct", but should rather be seen as a tradeoff between how much resources we spend packing refs and how inefficient the ref store becomes. For all I can say, we have successfully been using the exact same heuristic in Gitaly for several years by now. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent bd51dca commit c3459ae

File tree

4 files changed

+165
-10
lines changed

4 files changed

+165
-10
lines changed
 

‎refs/files-backend.c

+65
Original file line numberDiff line numberDiff line change
@@ -1300,6 +1300,68 @@ static int should_pack_ref(struct files_ref_store *refs,
13001300
return 0;
13011301
}
13021302

1303+
static int should_pack_refs(struct files_ref_store *refs,
1304+
struct pack_refs_opts *opts)
1305+
{
1306+
struct ref_iterator *iter;
1307+
size_t packed_size;
1308+
size_t refcount = 0;
1309+
size_t limit;
1310+
int ret;
1311+
1312+
if (!(opts->flags & PACK_REFS_AUTO))
1313+
return 1;
1314+
1315+
ret = packed_refs_size(refs->packed_ref_store, &packed_size);
1316+
if (ret < 0)
1317+
die("cannot determine packed-refs size");
1318+
1319+
/*
1320+
* Packing loose references into the packed-refs file scales with the
1321+
* number of references we're about to write. We thus decide whether we
1322+
* repack refs by weighing the current size of the packed-refs file
1323+
* against the number of loose references. This is done such that we do
1324+
* not repack too often on repositories with a huge number of
1325+
* references, where we can expect a lot of churn in the number of
1326+
* references.
1327+
*
1328+
* As a heuristic, we repack if the number of loose references in the
1329+
* repository exceeds `log2(nr_packed_refs) * 5`, where we estimate
1330+
* `nr_packed_refs = packed_size / 100`, which scales as following:
1331+
*
1332+
* - 1kB ~ 10 packed refs: 16 refs
1333+
* - 10kB ~ 100 packed refs: 33 refs
1334+
* - 100kB ~ 1k packed refs: 49 refs
1335+
* - 1MB ~ 10k packed refs: 66 refs
1336+
* - 10MB ~ 100k packed refs: 82 refs
1337+
* - 100MB ~ 1m packed refs: 99 refs
1338+
*
1339+
* We thus allow roughly 16 additional loose refs per factor of ten of
1340+
* packed refs. This heuristic may be tweaked in the future, but should
1341+
* serve as a sufficiently good first iteration.
1342+
*/
1343+
limit = log2u(packed_size / 100) * 5;
1344+
if (limit < 16)
1345+
limit = 16;
1346+
1347+
iter = cache_ref_iterator_begin(get_loose_ref_cache(refs, 0), NULL,
1348+
refs->base.repo, 0);
1349+
while ((ret = ref_iterator_advance(iter)) == ITER_OK) {
1350+
if (should_pack_ref(refs, iter->refname, iter->oid,
1351+
iter->flags, opts))
1352+
refcount++;
1353+
if (refcount >= limit) {
1354+
ref_iterator_abort(iter);
1355+
return 1;
1356+
}
1357+
}
1358+
1359+
if (ret != ITER_DONE)
1360+
die("error while iterating over references");
1361+
1362+
return 0;
1363+
}
1364+
13031365
static int files_pack_refs(struct ref_store *ref_store,
13041366
struct pack_refs_opts *opts)
13051367
{
@@ -1312,6 +1374,9 @@ static int files_pack_refs(struct ref_store *ref_store,
13121374
struct strbuf err = STRBUF_INIT;
13131375
struct ref_transaction *transaction;
13141376

1377+
if (!should_pack_refs(refs, opts))
1378+
return 0;
1379+
13151380
transaction = ref_store_transaction_begin(refs->packed_ref_store, &err);
13161381
if (!transaction)
13171382
return -1;

‎refs/packed-backend.c

+18
Original file line numberDiff line numberDiff line change
@@ -1250,6 +1250,24 @@ int packed_refs_is_locked(struct ref_store *ref_store)
12501250
return is_lock_file_locked(&refs->lock);
12511251
}
12521252

1253+
int packed_refs_size(struct ref_store *ref_store,
1254+
size_t *out)
1255+
{
1256+
struct packed_ref_store *refs = packed_downcast(ref_store, REF_STORE_READ,
1257+
"packed_refs_size");
1258+
struct stat st;
1259+
1260+
if (stat(refs->path, &st) < 0) {
1261+
if (errno != ENOENT)
1262+
return -1;
1263+
*out = 0;
1264+
return 0;
1265+
}
1266+
1267+
*out = st.st_size;
1268+
return 0;
1269+
}
1270+
12531271
/*
12541272
* The packed-refs header line that we write out. Perhaps other traits
12551273
* will be added later.

‎refs/packed-backend.h

+7
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,13 @@ int packed_refs_lock(struct ref_store *ref_store, int flags, struct strbuf *err)
2727
void packed_refs_unlock(struct ref_store *ref_store);
2828
int packed_refs_is_locked(struct ref_store *ref_store);
2929

30+
/*
31+
* Obtain the size of the `packed-refs` file. Reports `0` as size in case there
32+
* is no packed-refs file. Returns 0 on success, negative otherwise.
33+
*/
34+
int packed_refs_size(struct ref_store *ref_store,
35+
size_t *out);
36+
3037
/*
3138
* Return true if `transaction` really needs to be carried out against
3239
* the specified packed_ref_store, or false if it can be skipped

‎t/t0601-reffiles-pack-refs.sh

+75-10
Original file line numberDiff line numberDiff line change
@@ -362,21 +362,86 @@ test_expect_success 'pack-refs does not drop broken refs during deletion' '
362362

363363
for command in "git pack-refs --all --auto" "git maintenance run --task=pack-refs --auto"
364364
do
365-
test_expect_success "$command unconditionally packs loose refs" '
365+
test_expect_success "$command does not repack below 16 refs without packed-refs" '
366366
test_when_finished "rm -rf repo" &&
367367
git init repo &&
368368
(
369369
cd repo &&
370370
git config set maintenance.auto false &&
371-
test_commit initial &&
372-
373-
git update-ref refs/heads/something HEAD &&
374-
test_path_is_file .git/refs/heads/something &&
375-
git rev-parse refs/heads/something >expect &&
376-
$command &&
377-
test_path_is_missing .git/refs/heads/something &&
378-
git rev-parse refs/heads/something >actual &&
379-
test_cmp expect actual
371+
git commit --allow-empty --message "initial" &&
372+
373+
# Create 14 additional references, which brings us to
374+
# 15 together with the default branch.
375+
printf "create refs/heads/loose-%d HEAD\n" $(test_seq 14) >stdin &&
376+
git update-ref --stdin <stdin &&
377+
test_path_is_missing .git/packed-refs &&
378+
git pack-refs --auto --all &&
379+
test_path_is_missing .git/packed-refs &&
380+
381+
# Create the 16th reference, which should cause us to repack.
382+
git update-ref refs/heads/loose-15 HEAD &&
383+
git pack-refs --auto --all &&
384+
test_path_is_file .git/packed-refs
385+
)
386+
'
387+
388+
test_expect_success "$command does not repack below 16 refs with small packed-refs" '
389+
test_when_finished "rm -rf repo" &&
390+
git init repo &&
391+
(
392+
cd repo &&
393+
git config set maintenance.auto false &&
394+
git commit --allow-empty --message "initial" &&
395+
396+
git pack-refs --all &&
397+
test_line_count = 2 .git/packed-refs &&
398+
399+
# Create 15 loose references.
400+
printf "create refs/heads/loose-%d HEAD\n" $(test_seq 15) >stdin &&
401+
git update-ref --stdin <stdin &&
402+
git pack-refs --auto --all &&
403+
test_line_count = 2 .git/packed-refs &&
404+
405+
# Create the 16th loose reference, which should cause us to repack.
406+
git update-ref refs/heads/loose-17 HEAD &&
407+
git pack-refs --auto --all &&
408+
test_line_count = 18 .git/packed-refs
409+
)
410+
'
411+
412+
test_expect_success "$command scales with size of packed-refs" '
413+
test_when_finished "rm -rf repo" &&
414+
git init repo &&
415+
(
416+
cd repo &&
417+
git config set maintenance.auto false &&
418+
git commit --allow-empty --message "initial" &&
419+
420+
# Create 99 packed refs. This should cause the heuristic
421+
# to require more than the minimum amount of loose refs.
422+
test_seq 99 |
423+
while read i
424+
do
425+
printf "create refs/heads/packed-%d HEAD\n" $i || return 1
426+
done >stdin &&
427+
git update-ref --stdin <stdin &&
428+
git pack-refs --all &&
429+
test_line_count = 101 .git/packed-refs &&
430+
431+
# Create 24 loose refs, which should not yet cause us to repack.
432+
printf "create refs/heads/loose-%d HEAD\n" $(test_seq 24) >stdin &&
433+
git update-ref --stdin <stdin &&
434+
git pack-refs --auto --all &&
435+
test_line_count = 101 .git/packed-refs &&
436+
437+
# Create another handful of refs to cross the border.
438+
# Note that we explicitly do not check for strict
439+
# boundaries here, as this also depends on the size of
440+
# the object hash.
441+
printf "create refs/heads/addn-%d HEAD\n" $(test_seq 10) >stdin &&
442+
git update-ref --stdin <stdin &&
443+
git pack-refs --auto --all &&
444+
test_line_count = 135 .git/packed-refs
380445
)
381446
'
382447
done

0 commit comments

Comments
 (0)
Please sign in to comment.