Git 2.26: Up to 3.3x faster pattern searches in git-grep

Quick Links: Final Essay | Timing Results | Poster

Intro

Two weeks ago, Git 2.26.0 was released! And among many great changes, I’m very happy to say that you can expect faster multithreaded git-grep searches! This is the project I’ve worked on for about a year, as my Google Summer of Code project and Undergraduate Thesis, at the University of São Paulo.

I’ve been posting about the project development here. But with the 2.26.0 release, the project is finally concluded, and I think it’s a great time to wrap it all up and present the final results. For those interested in knowing more about the development process, the final thesis is also attached in the Additional Resources section and the top of the page.

Motivation and Goal

Git has become the most popular¹ version control system for software development. Being used to manage a large variety of projects, with different magnitudes (in both content and history sizes), it must be built to scale. With this in mind, Git’s grep command – which searches for text patterns in the tracked files – was made parallel in 2010 using a producer-consumer mechanism. However, when operating in Git’s internal object store (e.g. for a search in older revisions), the multithreaded version became slower than the sequential code. For this reason, threads were later disabled in this case.

The main goal of this work was to improve the parallelization of the grep command and re-enable threads for all its use cases. In this process, it was also desired to implement an optimized and secure thread access to the object reading functions, which could be future used to parallelize other sections of the codebase.

Development

The initial phase of the project involved running several tests to determine which sections of the code were the most time-consuming. For this task, gprof and perf were used in conjunction with visualization tools such as gprog2dot and FlameGraph. The results showed that, in some cases, the object decompression routines accounted for up to one-third of git-grep’s total execution time. These routines, despite being thread-safe, had to be serialized due to the surrounding thread-unsafe object reading machinery.

With improvements to the object reading code and to the parallelism of git-grep, it was possible to allow safe parallel access to the decompressing functions. This, in turn, resulted in an acceleration of over 3x, with 8 threads on a 4-core processor w/ hyper-threading. Some of the changes, such as reducing the code inside critical sections, also brought speedups for the working tree searches, as we will see in the next section. In addition, the process of studying git-grep’s code and its call graph allowed us to find and fix some race condition cases and a bug regarding searches in submodules.

Results

The following plots compare git-grep’s execution times before and after the proposed changes. The results presented are all means of 30 executions with a 95% confidence interval. Each plot corresponds to a different machine where the tests were executed (grenoble has an HDD and mango an SSD). Also, note that the original code didn’t allow multiple threads for object store searches, but we enabled them just for comparison. See more info at the Tests Methodology Annex.

Time comparison between original and final git-grep code (on grenoble)

In the cached searches, we observed speedups of up to 3.34x over the original code, and almost 5x over the original code with threads re-enabled but without the improvements. Additionally, the working tree searches also got faster with our changes, showing speedups of up to 1.53x.

Additional Resources

Final Essay (and other documents)

Below are some additional materials which describe the development process in much more detail. The first (and most complete) one is the final undergraduate thesis; the second one is the poster presented at the university in an open session; and the third one is the initial proposal (which is considerably outdated, since the project’s main idea changed quite a bit during the development course).

Note: Both the Final Essay and the Poster mention that there was still a race condition case to fix. But, as we later discovered, the code was already protected. To understand how, please check this patch.

Code

Below is the list of code patches originated from this work, and sent to the Git project. All of them have already been incorporated into the upstream Git repository.

Solo patch “grep: fix worktree case in submodules” [version 1]
Merged, part of Git version 2.24.0.

[1/1] grep: fix worktree case in submodules

Patchset “grep: improve threading and fix race conditions” [version 3]
Merged, part of Git version 2.26.0.

Annex: Tests Methodology

The plots presented in the Results section were generated with timings of the command below (adding the --cached option when testing searches in the object store). The regular expression chosen represents a realistic and time-consuming search case.

  $ git --no-pager grep --color=never --extended-regexp \
	--threads=<number of threads> \
	'(static|extern) (int|double) \*'

The Chromium repository² was used as test data, for being a relatively large repo in both content and history size. Also, Chromium’s developers had already reported some difficulties regarding a couple of slow Git commands in their daily usage.

As the problem we were trying to solve is related to I/O, the tests were repeated in two different machines: one with HDD (grenoble) and one with SSD (mango). Both powered by quad-core CPUs with hyper-threading, running a Linux distribution. More information about the machines and tests can be found in the Final Essay.

Footnotes

According to the Stack Overflow Developer Survey Results from 2018 ↩
Downloaded from https://chromium.googlesource.com/chromium/src/ at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019). ↩