Cloning the Linux kernel in under a minute #579
Byron
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Cloning the Linux kernel in under a minute
TLDR
Using
gitoxidewith default settings, we can now clone the linux kernel repository (receiving a pack, resolving it and a checkout of the working tree) in 43s using all cores of an M1 Pro. Canonicalgit(with default settings) finishes the same clone in 115s, makinggitoxide~2.7x faster.On a 16 core AMD workstation we can achieve the same clone in 30s, while canonical
gittakes 141s. Putting it into a number,gitoxideis able to outperformgitby a factor of ~4.8.This will make a difference on CI and locally saving time and memory, when it's ready for prime time early next year.
For reproduction, please see the
Reproductionsection at the bottom of the document, or keep going for all the details.The Results
We see that
gixis ~1.4x faster than git on a single core, and ~2.6x faster with all cores of the test system.It's notable that the default settings of
gixcompared to the ones bygitallow it to reach ~2.7x of its speed as it will use all cores of the test system (M1 Pro).Raw benchmark results
gix -c pack.threads=1 -c checkout.workers=1 clone ./linux ./linux-clonegit -c pack.threads=1 -c checkout.workers=1 clone file://$PWD/linux ./linux-clonegix -c pack.threads=1 -c checkout.workers=4 clone ./linux ./linux-clonegit -c pack.threads=1 -c checkout.workers=4 clone file://$PWD/linux ./linux-clonegix -c pack.threads=3 -c checkout.workers=1 clone ./linux ./linux-clonegit -c pack.threads=3 -c checkout.workers=1 clone file://$PWD/linux ./linux-clonegix -c pack.threads=3 -c checkout.workers=4 clone ./linux ./linux-clonegit -c pack.threads=3 -c checkout.workers=4 clone file://$PWD/linux ./linux-clonegix -c pack.threads=10 -c checkout.workers=1 clone ./linux ./linux-clonegit -c pack.threads=10 -c checkout.workers=1 clone file://$PWD/linux ./linux-clonegix -c pack.threads=10 -c checkout.workers=4 clone ./linux ./linux-clonegit -c pack.threads=10 -c checkout.workers=4 clone file://$PWD/linux ./linux-cloneBonus Round: Amd64 (or how to clone in ~30s)
The test system at hand is a custom built PC with an AMD Ryzen™ 9 3950X, 64GB 3200MHz DDR4, M.2 PCIE-4 NVME SDD running Fedora 36 workstation.
It's notable how much better a standard build of
gixperforms compared to a standard build ofgitwith just a single thread being 2.9x faster. With default settings,gixusing all cores outperformsgitby a factor of ~4.8x which effectively uses only 3 cores.Note that even though
gitoxidescales nicely with additional cores, the absolute time saved has diminishing returns due to the pack transfer already taking ~23s, while the checkout takes 1.7s and is limited by the SSD + filesystem.Raw benchmark results
gix -c pack.threads=1 -c checkout.workers=4 clone ./linux ./linux-clonegit -c pack.threads=1 -c checkout.workers=4 clone file://$PWD/linux ./linux-clonegix -c pack.threads=3 -c checkout.workers=4 clone ./linux ./linux-clonegit -c pack.threads=3 -c checkout.workers=4 clone file://$PWD/linux ./linux-clonegix -c pack.threads=8 -c checkout.workers=4 clone ./linux ./linux-clonegit -c pack.threads=8 -c checkout.workers=4 clone file://$PWD/linux ./linux-clonegix -c pack.threads=16 -c checkout.workers=4 clone ./linux ./linux-clonegit -c pack.threads=16 -c checkout.workers=4 clone file://$PWD/linux ./linux-clonegix -c pack.threads=32 -c checkout.workers=4 clone ./linux ./linux-clonegit -c pack.threads=32 -c checkout.workers=4 clone file://$PWD/linux ./linux-cloneBreakdown of clone time with default settings
The times shown here may overlap due to parallel execution and do not add up to the total runtime (30 seconds).
Bonus Round: Memory Consumption
All this performance offered by
gixmight come at the expense of memory consumption, and here are two measurements created by hand with default options to represent typical usage./usr/bin/time -lp gix -v clone ./linux ./linux-clone/usr/bin/time -lp git clone file://$PWD/linux ./linux-clonegixcan do the same work faster and with nearly half the memory.Raw benchmark results
Bonus: Racing
gitSee how
gixtries to beatgitin cloning the Linux kernel over the network on a beefy machine.Conclusions
gitoxidehas the potential to substantially speed upcloneoperations by scaling with today's multi-core CPUs, and it will keep scaling with every new hardware generation.gitcurrently does not scale well with cores and it will be a major undertaking to change that.Special Thanks
I am grateful for the help of Pascal Kuthe who generously gave his time to review this post and improve it tremendously in the process. He is also responsible for the graphs, making it so much more accessible, and prettier too. Thank you!
FAQ
Can I use it now?
Yes, if the checkout does not involve submodules or rely on filters (like line-feed conversions or
git-lfs). These features are expected to be fully implemented early next year (2023).Can I post my own results here?
Yes, please, test it on your 128core machine to see how low these numbers can go. Please note though that the runtime is dominated by transfer time which clocks in at about 28s.
How can
gitoxidebe that fast?gitoxidehas been built from the ground up for performance. It doesn't use the heap generously and reuses allocations where ever feasible.On top of that, the most time-consuming stage of a clone, the pack index creation, is algorithmically optimal such that a data structure is built to know exactly which delta to apply on which base, effectively representing the delta-tree in memory. With it one can resolve the pack, that is decompress every object, without requiring any other caches and without wasting any work or memory.
Thanks to the Rust ecosystem, it's easy to get the best performing ZLIB implementation and the fastest SHA1 hash implementation for most platforms, which affects this workload a lot. With the right hardware, this step can now scale linearly with each core, yielding ~38GB/s decompression speed on a recent AMD Ryzen.
All of the above wouldn't be possible without Rust, the key-enabler for all optimizations and fearless concurrency.
Why is canonical
gitslower on an AMD workstation than on a M1 MacBook?We found this surprsing as well, but after rerunning the benchmarks multiple times, the results turned out to be consistent.
Some component of canonical
gitis probably much better optimized on Aarch64 to greatly improve performance.As
gitdoes not scale nearly as well asgitoxideacross multiple cores, it's not able to capitalize on the higher core count, which increases the gap even further.Reproduction
The test setup
We will use the Linux kernel as a benchmark. To get a reliable benchmark we exclude the network by using a local copy of the repository. To get clone performance similar to the one of an optimized server we also enable some caches.
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git cd linux git checkout v6.0-rc3 git repack --write-bitmap-index -aNow we can direct our git clients,
gixandgit, towards the local copy and perform a clone using the same machinery that would normally run over the network.By default
gitwill just hardlink the relevant files when performing local clones. However we can nudge it to perform a proper clone by usinggit clone file://./linux linux-clone.gitoxidedoes currently not utilize hardlinking and just callinggix clone ./linux linux-cloneis enough.Since
gitnow treats the clone as 'remote' with limited trust, it would force a connectivity check on the received data to assure it's not garbage which takes time and (a lot of) memory, so we disable it with the following patch on top of this commit.The patch can be applied with
git apply <PATCH_FILE>and with that, we get:make # add -j10 for using 10 cores ./git --version ./git version 2.38.1.381.gc03801e19c.dirtyThe experimental
gitoxideCLIgixcan be installed using cargo:Understanding Performance Options
The unit of data transport in git is a
packwhich is a set of highly compressed objects. Together these objects make up the object graph of the repository which contains all commits and files tracked by git.When cloning, a
packwith all objects required by the client is created by the server and streamed to the client.This process can be broken down into the following steps:
gitand 'all-CPUs' forgixgitandall-CPUsforgixThus we have two parameters that affect the two last stages of the clone operation, with the last stage being the fastest one, and the second to last being the one that that has the biggest impact on performance.
Gathering Results
We use
hyperfinefor obtaining the results and run it with:Note that the checkout is done onto an actual disk (SSD) to represent typical usage.
Our parameters have been chosen so that they reflect typical usage:
gitif the host has that many CPUs. It's chosen because it's known that higher numbers yield greatly diminished returns or are even reducing performance due to lock contention.git, meaning only one file will be written at a time.gixorgit.Detailed test data
Output of test runs
gix clonegitgit clonegixOutput of memory consumption tests
gixgit
Beta Was this translation helpful? Give feedback.
All reactions