Git 2.26: Up to 3.3x faster pattern searches in git-grep
Tags: gsoc, git, thesis, parallelismQuick Links: Final Essay | Timing Results | Poster
Intro
Two weeks ago, Git 2.26.0 was released! And among many great changes, I’m very happy to say that you can expect faster multithreaded git-grep searches! This is the project I’ve worked on for about a year, as my Google Summer of Code project and Undergraduate Thesis, at the University of São Paulo.
I’ve been posting about the project development here. But with the 2.26.0 release, the project is finally concluded, and I think it’s a great time to wrap it all up and present the final results. For those interested in knowing more about the development process, the final thesis is also attached in the Additional Resources section and the top of the page.
Motivation and Goal
Git has become the most popular1 version control system for software development. Being used to manage a large variety of projects, with different magnitudes (in both content and history sizes), it must be built to scale. With this in mind, Git’s grep command – which searches for text patterns in the tracked files – was made parallel in 2010 using a producer-consumer mechanism. However, when operating in Git’s internal object store (e.g. for a search in older revisions), the multithreaded version became slower than the sequential code. For this reason, threads were later disabled in this case.
The main goal of this work was to improve the parallelization of the grep command and re-enable threads for all its use cases. In this process, it was also desired to implement an optimized and secure thread access to the object reading functions, which could be future used to parallelize other sections of the codebase.
Development
The initial phase of the project involved running several tests to determine
which sections of the code were the most time-consuming. For this task, gprof
and perf
were used in conjunction with visualization tools such as
gprog2dot and
FlameGraph. The results showed
that, in some cases, the object decompression routines accounted for up to
one-third of git-grep’s total execution time. These routines, despite being
thread-safe, had to be serialized due to the surrounding thread-unsafe object
reading machinery.
With improvements to the object reading code and to the parallelism of git-grep, it was possible to allow safe parallel access to the decompressing functions. This, in turn, resulted in an acceleration of over 3x, with 8 threads on a 4-core processor w/ hyper-threading. Some of the changes, such as reducing the code inside critical sections, also brought speedups for the working tree searches, as we will see in the next section. In addition, the process of studying git-grep’s code and its call graph allowed us to find and fix some race condition cases and a bug regarding searches in submodules.
Results
The following plots compare git-grep’s execution times before and after the
proposed changes. The results presented are all means of 30 executions with a
95% confidence interval. Each plot corresponds to a different machine where the
tests were executed (grenoble
has an HDD and mango
an SSD). Also, note that
the original code didn’t allow multiple threads for object store searches, but
we enabled them just for comparison. See more info at the
Tests Methodology Annex.
In the cached searches, we observed speedups of up to 3.34x over the original code, and almost 5x over the original code with threads re-enabled but without the improvements. Additionally, the working tree searches also got faster with our changes, showing speedups of up to 1.53x.
Additional Resources
Final Essay (and other documents)
Below are some additional materials which describe the development process in much more detail. The first (and most complete) one is the final undergraduate thesis; the second one is the poster presented at the university in an open session; and the third one is the initial proposal (which is considerably outdated, since the project’s main idea changed quite a bit during the development course).
Note: Both the Final Essay and the Poster mention that there was still a race condition case to fix. But, as we later discovered, the code was already protected. To understand how, please check this patch.
Code
Below is the list of code patches originated from this work, and sent to the Git project. All of them have already been incorporated into the upstream Git repository.
Solo patch “grep: fix worktree case in submodules” [version 1]
Merged, part of Git version 2.24.0.
Patchset “grep: improve threading and fix race conditions” [version 3]
Merged, part of Git version 2.26.0.
- [01/12] grep: fix race conditions on userdiff calls
- [02/12] grep: fix race conditions at grep_submodule()
- [03/12] grep: fix racy calls in grep_objects()
- [04/12] replace-object: make replace operations thread-safe
- [05/12] object-store: allow threaded access to object reading
- [06/12] grep: replace grep_read_mutex by internal obj read lock
- [07/12] submodule-config: add skip_if_read option to repo_read_gitmodules()
- [08/12] grep: allow submodule functions to run in parallel
- [09/12] grep: protect packed_git [re-]initialization
- [10/12] grep: re-enable threads in non-worktree case
- [11/12] grep: move driver pre-load out of critical section
- [12/12] grep: use no. of cores as the default no. of thread
Annex: Tests Methodology
The plots presented in the Results section were generated with
timings of the command below (adding the --cached
option when testing searches
in the object store). The regular expression chosen represents a realistic and
time-consuming search case.
$ git --no-pager grep --color=never --extended-regexp \
--threads=<number of threads> \
'(static|extern) (int|double) \*'
The Chromium repository2 was used as test data, for being a relatively large repo in both content and history size. Also, Chromium’s developers had already reported some difficulties regarding a couple of slow Git commands in their daily usage.
As the problem we were trying to solve is related to I/O, the tests were
repeated in two different machines: one with HDD (grenoble
) and one with SSD
(mango
). Both powered by quad-core CPUs with hyper-threading, running a Linux
distribution. More information about the machines and tests can be found in the
Final Essay.
Footnotes
-
According to the Stack Overflow Developer Survey Results from 2018 ↩
-
Downloaded from https://chromium.googlesource.com/chromium/src/ at commit
03ae96f (“Add filters testing at DSF=2”, 04-06-2019)
. ↩
Til next time,
Matheus