Skip to content

Replace experimental path walk feature with upstream version #5689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
4bc0ba0
pack-objects: extract should_attempt_deltas()
derrickstolee May 16, 2025
70664d2
pack-objects: add --path-walk option
derrickstolee May 16, 2025
9fcfe12
pack-objects: update usage to match docs
derrickstolee May 16, 2025
3ce9e5f
p5313: add performance tests for --path-walk
derrickstolee May 16, 2025
861d4bc
pack-objects: introduce GIT_TEST_PACK_PATH_WALK
derrickstolee May 16, 2025
6e95bf8
t5538: add tests to confirm deltas in shallow pushes
derrickstolee May 16, 2025
5f71150
repack: add --path-walk option
derrickstolee May 16, 2025
4f7f571
pack-objects: enable --path-walk via config
derrickstolee May 16, 2025
4933152
scalar: enable path-walk during push via config
derrickstolee May 16, 2025
206a1bb
pack-objects: refactor path-walk delta phase
derrickstolee May 16, 2025
e539479
pack-objects: thread the path-based compression
derrickstolee May 16, 2025
4705889
path-walk: add new 'edge_aggressive' option
derrickstolee May 16, 2025
c178b02
pack-objects: allow --shallow and --path-walk
derrickstolee May 16, 2025
48fc0dd
fixup! pack-objects: thread the path-based compression
derrickstolee Jun 20, 2025
b3cf5e2
fixup! pack-objects: refactor path-walk delta phase
derrickstolee Jun 20, 2025
f561554
fixup! scalar: enable path-walk during push via config
derrickstolee Jun 20, 2025
0257d18
fixup! pack-objects: enable --path-walk via config
derrickstolee Jun 20, 2025
3dcbb2e
fixup! repack: add --path-walk option
derrickstolee Jun 20, 2025
35a2f21
fixup! pack-objects: introduce GIT_TEST_PACK_PATH_WALK
derrickstolee Jun 20, 2025
a85c9c1
fixup! pack-objects: add --path-walk option
derrickstolee Jun 20, 2025
0ea858d
fixup! pack-objects: extract should_attempt_deltas()
derrickstolee Jun 20, 2025
6aa34fd
fixup! revision: create mark_trees_uninteresting_dense()
derrickstolee Jun 20, 2025
80c1092
Replace path walk feature with upstream version
derrickstolee Jun 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 2 additions & 6 deletions Documentation/config/pack.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -156,12 +156,8 @@ pack.useSparse::
`true`.

pack.usePathWalk::
When true, git will default to using the '--path-walk' option in
'git pack-objects' when the '--revs' option is present. This
algorithm groups objects by path to maximize the ability to
compute delta chains across historical versions of the same
object. This may disable other options, such as using bitmaps to
enumerate objects.
Enable the `--path-walk` option by default for `git pack-objects`
processes. See linkgit:git-pack-objects[1] for full details.

pack.preferBitmapTips::
When selecting which commits will receive bitmaps, prefer a
Expand Down
33 changes: 17 additions & 16 deletions Documentation/git-pack-objects.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@ SYNOPSIS
--------
[verse]
'git pack-objects' [-q | --progress | --all-progress] [--all-progress-implied]
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
[--local] [--incremental] [--window=<n>] [--depth=<n>]
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
[--cruft] [--cruft-expiration=<time>]
[--stdout [--filter=<filter-spec>] | <base-name>]
[--shallow] [--keep-true-parents] [--[no-]sparse]
[--name-hash-version=<n>] [--path-walk] < <object-list>
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
[--local] [--incremental] [--window=<n>] [--depth=<n>]
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
[--cruft] [--cruft-expiration=<time>]
[--stdout [--filter=<filter-spec>] | <base-name>]
[--shallow] [--keep-true-parents] [--[no-]sparse]
[--name-hash-version=<n>] [--path-walk] < <object-list>


DESCRIPTION
Expand Down Expand Up @@ -376,15 +376,16 @@ when writing reachability bitmap files with `--write-bitmap-index` and it
will be automatically changed to version `1`.

--path-walk::
By default, `git pack-objects` walks objects in an order that
presents trees and blobs in an order unrelated to the path they
appear relative to a commit's root tree. The `--path-walk` option
enables a different walking algorithm that organizes trees and
blobs by path. This has the potential to improve delta compression
especially in the presence of filenames that cause collisions in
Git's default name-hash algorithm. Due to changing how the objects
are walked, this option is not compatible with `--delta-islands`,
`--shallow`, or `--filter`.
Perform compression by first organizing objects by path, then a
second pass that compresses across paths as normal. This has the
potential to improve delta compression especially in the presence
of filenames that cause collisions in Git's default name-hash
algorithm.
+
Incompatible with `--delta-islands`, `--shallow`, or `--filter`. The
`--use-bitmap-index` option will be ignored in the presence of
`--path-walk.`


DELTA ISLANDS
-------------
Expand Down
13 changes: 2 additions & 11 deletions Documentation/git-repack.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -259,17 +259,8 @@ linkgit:git-multi-pack-index[1]).
See linkgit:git-pack-objects[1] for full details.

--path-walk::
This option passes the `--path-walk` option to the underlying
`git pack-options` process (see linkgit:git-pack-objects[1]).
By default, `git pack-objects` walks objects in an order that
presents trees and blobs in an order unrelated to the path they
appear relative to a commit's root tree. The `--path-walk` option
enables a different walking algorithm that organizes trees and
blobs by path. This has the potential to improve delta compression
especially in the presence of filenames that cause collisions in
Git's default name-hash algorithm. Due to changing how the objects
are walked, this option is not compatible with `--delta-islands`
or `--filter`.
Pass the `--path-walk` option to the underlying `git pack-objects`
process. See linkgit:git-pack-objects[1] for full details.

CONFIGURATION
-------------
Expand Down
10 changes: 9 additions & 1 deletion Documentation/technical/api-path-walk.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,14 @@ better off using the revision walk API instead.
the revision walk so that the walk emits commits marked with the
`UNINTERESTING` flag.

`edge_aggressive`::
For performance reasons, usually only the boundary commits are
explored to find UNINTERESTING objects. However, in the case of
shallow clones it can be helpful to mark all trees and blobs
reachable from UNINTERESTING tip commits as UNINTERESTING. This
matches the behavior of `--objects-edge-aggressive` in the
revision API.

`pl`::
This pattern list pointer allows focusing the path-walk search to
a set of patterns, only emitting paths that match the given
Expand All @@ -69,5 +77,5 @@ Examples

See example usages in:
`t/helper/test-path-walk.c`,
`builtin/pack-objects.c`,
`builtin/backfill.c`
`builtin/pack-objects.c`
75 changes: 47 additions & 28 deletions builtin/pack-objects.c
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
#include "blob.h"
#include "tree.h"
#include "path-walk.h"
#include "trace2.h"

/*
* Objects we are going to pack are collected in the `to_pack` structure.
Expand Down Expand Up @@ -187,8 +188,14 @@ static inline void oe_set_delta_size(struct packing_data *pack,
#define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(&to_pack, obj, val)

static const char *const pack_usage[] = {
N_("git pack-objects --stdout [<options>] [< <ref-list> | < <object-list>]"),
N_("git pack-objects [<options>] <base-name> [< <ref-list> | < <object-list>]"),
N_("git pack-objects [-q | --progress | --all-progress] [--all-progress-implied]\n"
" [--no-reuse-delta] [--delta-base-offset] [--non-empty]\n"
" [--local] [--incremental] [--window=<n>] [--depth=<n>]\n"
" [--revs [--unpacked | --all]] [--keep-pack=<pack-name>]\n"
" [--cruft] [--cruft-expiration=<time>]\n"
" [--stdout [--filter=<filter-spec>] | <base-name>]\n"
" [--shallow] [--keep-true-parents] [--[no-]sparse]\n"
" [--name-hash-version=<n>] [--path-walk] < <object-list>"),
NULL
};

Expand All @@ -203,6 +210,7 @@ static int keep_unreachable, unpack_unreachable, include_tag;
static timestamp_t unpack_unreachable_expiration;
static int pack_loose_unreachable;
static int cruft;
static int shallow = 0;
static timestamp_t cruft_expiration;
static int local;
static int have_non_local_packs;
Expand Down Expand Up @@ -3291,6 +3299,9 @@ static int add_ref_tag(const char *tag UNUSED, const char *referent UNUSED, cons
static int should_attempt_deltas(struct object_entry *entry)
{
if (DELTA(entry))
/* This happens if we decided to reuse existing
* delta from a pack. "reuse_delta &&" is implied.
*/
return 0;

if (!entry->type_valid ||
Expand All @@ -3315,16 +3326,16 @@ static int should_attempt_deltas(struct object_entry *entry)
return 1;
}

static void find_deltas_for_region(struct object_entry *list UNUSED,
static void find_deltas_for_region(struct object_entry *list,
struct packing_region *region,
unsigned int *processed)
{
struct object_entry **delta_list;
uint32_t delta_list_nr = 0;
unsigned int delta_list_nr = 0;

ALLOC_ARRAY(delta_list, region->nr);
for (uint32_t i = 0; i < region->nr; i++) {
struct object_entry *entry = to_pack.objects + region->start + i;
for (size_t i = 0; i < region->nr; i++) {
struct object_entry *entry = list + region->start + i;
if (should_attempt_deltas(entry))
delta_list[delta_list_nr++] = entry;
}
Expand All @@ -3336,10 +3347,10 @@ static void find_deltas_for_region(struct object_entry *list UNUSED,

static void find_deltas_by_region(struct object_entry *list,
struct packing_region *regions,
uint32_t start, uint32_t nr)
size_t start, size_t nr)
{
unsigned int processed = 0;
uint32_t progress_nr;
size_t progress_nr;

if (!nr)
return;
Expand Down Expand Up @@ -3422,7 +3433,10 @@ static void ll_find_deltas_by_region(struct object_entry *list,
}

if (progress > pack_to_stdout)
fprintf_ln(stderr, _("Path-based delta compression using up to %d threads"),
fprintf_ln(stderr,
Q_("Path-based delta compression using up to %d thread",
"Path-based delta compression using up to %d threads",
delta_search_threads),
delta_search_threads);
CALLOC_ARRAY(p, delta_search_threads);

Expand Down Expand Up @@ -4489,11 +4503,11 @@ static void mark_bitmap_preferred_tips(void)
}
}

static inline int is_oid_interesting(struct repository *repo,
struct object_id *oid)
static inline int is_oid_uninteresting(struct repository *repo,
struct object_id *oid)
{
struct object *o = lookup_object(repo, oid);
return o && !(o->flags & UNINTERESTING);
return !o || (o->flags & UNINTERESTING);
}

static int add_objects_by_path(const char *path,
Expand Down Expand Up @@ -4521,7 +4535,7 @@ static int add_objects_by_path(const char *path,
OBJECT_INFO_FOR_PREFETCH) < 0)
continue;

exclude = !is_oid_interesting(the_repository, oid);
exclude = is_oid_uninteresting(the_repository, oid);

if (exclude && !thin)
continue;
Expand Down Expand Up @@ -4553,11 +4567,11 @@ static void get_object_list_path_walk(struct rev_info *revs)
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
unsigned int processed = 0;
int result;

info.revs = revs;
info.path_fn = add_objects_by_path;
info.path_fn_data = &processed;
revs->tag_objects = 1;

/*
* Allow the --[no-]sparse option to be interesting here, if only
Expand All @@ -4566,8 +4580,13 @@ static void get_object_list_path_walk(struct rev_info *revs)
* base objects.
*/
info.prune_all_uninteresting = sparse;
info.edge_aggressive = shallow;

trace2_region_enter("pack-objects", "path-walk", revs->repo);
result = walk_objects_by_path(&info);
trace2_region_leave("pack-objects", "path-walk", revs->repo);

if (walk_objects_by_path(&info))
if (result)
die(_("failed to pack objects via path-walk"));
}

Expand Down Expand Up @@ -4617,7 +4636,7 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)

warn_on_object_refname_ambiguity = save_warning;

if (use_bitmap_index && !path_walk && !get_object_list_from_bitmap(revs))
if (use_bitmap_index && !get_object_list_from_bitmap(revs))
return;

if (use_delta_islands)
Expand Down Expand Up @@ -4767,7 +4786,6 @@ int cmd_pack_objects(int argc,
struct repository *repo UNUSED)
{
int use_internal_rev_list = 0;
int shallow = 0;
int all_progress_implied = 0;
struct strvec rp = STRVEC_INIT;
int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
Expand Down Expand Up @@ -4947,17 +4965,18 @@ int cmd_pack_objects(int argc,

strvec_push(&rp, "pack-objects");

if (path_walk && filter_options.choice) {
warning(_("cannot use --filter with --path-walk"));
path_walk = 0;
}
if (path_walk && use_delta_islands) {
warning(_("cannot use delta islands with --path-walk"));
path_walk = 0;
}
if (path_walk && shallow) {
warning(_("cannot use --shallow with --path-walk"));
path_walk = 0;
if (path_walk) {
const char *option = NULL;
if (filter_options.choice)
option = "--filter";
else if (use_delta_islands)
option = "--delta-islands";

if (option) {
warning(_("cannot use %s with %s"),
option, "--path-walk");
path_walk = 0;
}
}
if (path_walk) {
strvec_push(&rp, "--boundary");
Expand Down
2 changes: 1 addition & 1 deletion builtin/repack.c
Original file line number Diff line number Diff line change
Expand Up @@ -1188,7 +1188,7 @@ int cmd_repack(int argc,
OPT_INTEGER(0, "name-hash-version", &po_args.name_hash_version,
N_("specify the name hash version to use for grouping similar objects by path")),
OPT_BOOL(0, "path-walk", &po_args.path_walk,
N_("(EXPERIMENTAL!) pass --path-walk to git-pack-objects")),
N_("pass --path-walk to git-pack-objects")),
OPT_NEGBIT('n', NULL, &run_update_server_info,
N_("do not run git-update-server-info"), 1),
OPT__QUIET(&po_args.quiet, N_("be quiet")),
Expand Down
1 change: 0 additions & 1 deletion ci/run-build-and-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ linux-TEST-vars)
export GIT_TEST_NO_WRITE_REV_INDEX=1
export GIT_TEST_CHECKOUT_WORKERS=2
export GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=1
export GIT_TEST_PACK_PATH_WALK=1
;;
linux-clang)
export GIT_TEST_DEFAULT_HASH=sha1
Expand Down
6 changes: 3 additions & 3 deletions pack-objects.h
Original file line number Diff line number Diff line change
Expand Up @@ -125,8 +125,8 @@ struct object_entry {
* as given by a starting index and a number of elements.
*/
struct packing_region {
uint32_t start;
uint32_t nr;
size_t start;
size_t nr;
};

struct packing_data {
Expand All @@ -135,7 +135,7 @@ struct packing_data {
uint32_t nr_objects, nr_alloc;

struct packing_region *regions;
uint32_t nr_regions, nr_regions_alloc;
size_t nr_regions, nr_regions_alloc;

int32_t *index;
uint32_t index_size;
Expand Down
6 changes: 5 additions & 1 deletion path-walk.c
Original file line number Diff line number Diff line change
Expand Up @@ -503,7 +503,11 @@ int walk_objects_by_path(struct path_walk_info *info)
if (prepare_revision_walk(info->revs))
die(_("failed to setup revision walk"));

/* Walk trees to mark them as UNINTERESTING. */
/*
* Walk trees to mark them as UNINTERESTING.
* This is particularly important when 'edge_aggressive' is set.
*/
info->revs->edge_hint_aggressive = info->edge_aggressive;
edge_repo = info->revs->repo;
edge_tree_list = root_tree_list;
mark_edges_uninteresting(info->revs, show_edge,
Expand Down
7 changes: 7 additions & 0 deletions path-walk.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,13 @@ struct path_walk_info {
*/
int prune_all_uninteresting;

/**
* When 'edge_aggressive' is set, then the revision walk will use
* the '--object-edge-aggressive' option to mark even more objects
* as uninteresting.
*/
int edge_aggressive;

/**
* Specify a sparse-checkout definition to match our paths to. Do not
* walk outside of this sparse definition. If the patterns are in
Expand Down
15 changes: 0 additions & 15 deletions revision.c
Original file line number Diff line number Diff line change
Expand Up @@ -212,21 +212,6 @@ static void add_children_by_path(struct repository *r,
free_tree_buffer(tree);
}

void mark_trees_uninteresting_dense(struct repository *r,
struct oidset *trees)
{
struct object_id *oid;
struct oidset_iter iter;

oidset_iter_init(trees, &iter);
while ((oid = oidset_iter_next(&iter))) {
struct tree *tree = lookup_tree(r, oid);

if (tree->object.flags & UNINTERESTING)
mark_tree_contents_uninteresting(r, tree);
}
}

void mark_trees_uninteresting_sparse(struct repository *r,
struct oidset *trees)
{
Expand Down
1 change: 0 additions & 1 deletion revision.h
Original file line number Diff line number Diff line change
Expand Up @@ -486,7 +486,6 @@ void put_revision_mark(const struct rev_info *revs,

void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit);
void mark_tree_uninteresting(struct repository *r, struct tree *tree);
void mark_trees_uninteresting_dense(struct repository *r, struct oidset *trees);
void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees);

/**
Expand Down
2 changes: 2 additions & 0 deletions t/helper/test-path-walk.c
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ int cmd__path_walk(int argc, const char **argv)
N_("toggle inclusion of tree objects")),
OPT_BOOL(0, "prune", &info.prune_all_uninteresting,
N_("toggle pruning of uninteresting paths")),
OPT_BOOL(0, "edge-aggressive", &info.edge_aggressive,
N_("toggle aggressive edge walk")),
OPT_BOOL(0, "stdin-pl", &stdin_pl,
N_("read a pattern list over stdin")),
OPT_END(),
Expand Down
Loading
Loading