GSoC Final Report | Matheus Tavares

Note: I wrote a small follow-up to this post talking about what I’ve done in my project after GSoC has finished. You can check it here.

So, this has been the final week of GSoC. It passed so fast! Let’s look back, summarizing what was done, the current state of the project and what’s next.

You may also want to check out my weekly posts on the project, which can be found here.

The project

You can check my project proposal here. The central idea behind it was to allow threading to more Git commands by making the pack access code thread-safe. We initially planned to protect the pack access functions before working to parallelize any specific Git command. However, without a use of the pack access API in mind, it was difficult to know which snippets really required thread-safety. In fact, I could potentially be working on snippets that wouldn’t necessarily need to be called in parallel. Also, we weren’t able to fully test and validate the changes (or its real benefits).

So we changed the approach and decided to work on the pack access code from the perspective of git-grep. This command was already parallel, except when grepping the index (with --cached) or given trees. And the reason for that was due to poor performance when the object store was involved, because it had to be handled sequentially. So git-grep was really showing itself as the perfect candidate for the project.

Most of GSoC’s first half, I invested in profiling and code analysis (also in my microproject, as I’ll talk about in the next section). I was trying to validate the hypothesis that allowing parallel object reading would indeed speedup git-grep. Even more, I wanted to know which sections of object reading would be more suitable for parallelization. And that’s when I started to investigate zlib inflation. You can check some of the plots and analysis made in this blog post.

With this patch, I measured the time git-grep spends on git_inflate() alone. And in my test case¹ it accounted for 49.14% of the total execution time! This finding defined which way the project would go from there. From this point, we focused on allowing parallel inflation and making other improvements to git-grep’s thread mechanics.

The patches

As my first patch in the project, I tried to remove function-scope static variables from thread-unsafe pack access functions. We ended up not sending this patch as we changed the project approach, but you can check it here:

sha1-file: convert some static variables to local

For my microproject, I worked improving the internal dir-iterator API and using it in place of readdir/opendir/closedir at builtin/clone.c (to copy or link objects in a local clone). Although I started this patchset before GSoC began, there were 8 iterations and the final version was only sent in the middle of July (it wasn’t so “micro” anymore haha). Here are v8’s patches:

All of the above were already merged into master and were part of Git v2.23.0.

Working at git-grep, I fixed a bug which would make git grep --recurse-submodule always grep the index for each submodule (whilst grepping the worktree should be the default). Here is the final v3:

grep: fix worktree case in submodules

This was merged into master (but not before v2.23.0, so it should be in the next release).

The next patchset was the series where I added internal locks to object reading, releasing it before inflation and reacquiring right after. This way we could have thread-safe object reading with the benefit of parallel inflation. In the same series, I also changed git-grep to use this lock instead of its own, which lead to a 3.00x speedup, as we’ll see later.

These were sent and merged into pu.

The above series re-enabled threading when grepping the index with a good speedup, but it didn’t support some flags: --recurse-submodules and --textconv. This was due to the thread-unsafeness of functions used when these options are enabled. So I kept working on a version to protect them and allow threads in these cases as well. The following patchset is still quite raw and thus, not sent yet. But it’s possible to see the ideas involved:

Note: You can check all the merged patches it the official repository, searching for “Matheus Tavares” in the respective branches. For example, here are the searches in master and pu.

Time Results

To see how well our optimizations to the cached git-grep have gone, I tested it at chromium’s repo⁴, mainly for two reasons:

It has a big object store (around 15GB), which really makes some operations slow
Chromium’s developers already reported some difficulties with slow Git commands⁵

The regexes used for the tests were:

Regex 1: abcd[02]
Regex 2: (static|extern) (int|double) \*

For each of them, the command git --no-pager grep --color=never --threads=<THREADS> <REGEX> HEAD was repeated 30 times after 2 warmup runs and the average execution time was taken. This was done to mitigate system fluctuations. All tests were executed on an i7-7700HQ with 16GB of RAM and SSD. Finally, here are the results:

Threads	Regex 1	Regex 2
1	17.3557s	20.8410s
2	9.7170s	11.2415s
8	6.1723s	6.9378s

This represents a 3.00x speedup, in the best case, which is very nice! But to make sure the optimization also performs well on HDD and/or older machines, the tests were repeated on an AMD Turion 64 X2 TL-62 (dual-core) with 4GB of RAM and HDD (SATA-150, 5400 rpm):

Threads	Regex 1	Regex 2
1	40.3347s	47.6173s
2	27.6547s	35.1797s

And again, we got a quite good time reduction!

What’s next

We already have a good speedup, but the project isn’t finished yet. In this last couple weeks, I’ve been struggling to allow threads when grepping the index with --recurse-submodules or --textconv. The problem is that the function calls involved in these options are not thread-safe yet and they might as well conflict with the now protected object reading functions.

So, after GSoC, I plan to continue this project tackling this issue. I already have some drafts on how to do it. For example the last patchset on The Patches section (which wasn’t sent due to some yet present data races).

However, I’m planning a more incremental process, seeking the best performance gain without changing too much of the code at a time. Certainly, the solution won’t be ideal, but it should bring some improvement in the short term, and be much less risky.

So, what I have in mind is to use a single recursive mutex (which Git already supports for Linux and Windows), to protect both the object reading and submodules’ functions. With this, conflicting operations would be avoided through mutual exclusion and submodule’s functions would still be able to load objects without re-locking errors.

This will hopefully allow us to re-enable threadings with --recurse-submodules getting a good speedup and no data races. The downside is that parallelizable tasks such as object inflation on submodule’s initializations would be serialized. But aiming for incremental changes, that’s an improvement we can think of in the future. As my mentors suggested, I can also write a document at Documentation/technical explaining the parallel git-grep mechanics and how it could be further improved.

Finally, there are some extra possible improvements to tackle at git-grep:

Stop adding subrepo’s object directories to the in-memory alternates list. (This is done at builtin/grep.c::grep_submodule())
Move already thread-safe functions out of git-grep’s critical sections, for better performance. One example is the driver pre-loading at builtin/grep.c::add_work().
Protect lazy initializations and refine the big object reading lock.
Once git-grep already supports threads with --recurse-submodules, do the same for --textconv.
[extra] Refactor the call stack originated at submodule-config.c::config_from_gitmodules() so that submodules don’t need to be added to the in-memory alternates list. This should also bring better performance to object reading operations. I already developed two patches⁶ during GSoC to tackle this issue, but I have to revisit and test them before sending.

Adjacent achievements

Because of GSoC and my work on Git, I also got the opportunity to do some additional related activities:

Callpath

Working on my project, I often needed to check if some thread-unsafe function could be present in a call stack or not. GNU cflow is a good alternative. But it only performs static analysis. So I wrote a simple script called callpath which uses GDB + dot to plot all the paths that lead to a specific function in an execution.

Talk at linuxdev-br

I got to present at the linuxdev-br conference, showing a little bit of Git’s code. Renato and I talked about object-oriented techniques in C using Linux and Git as study cases. I focused mostly on the dir-iterator API (which I worked on as my GSoC’s microproject). Our slides are available here and the lecture recording here. Bellow are some pictures of the talk (click to enlarge):

(Pictures by Eduardo Silva)

Git group at FLUSP

I’m part of FLUSP, a group of students at the University of São Paulo focused on contributing to FLOSS projects. This year, some of us started meeting weekly to exchange Git learnings and tips on how to contribute to it. We’ll hopefully see some patches from the group in the ML soon :)

Thanks

I want to thank my mentors, Christian and Olga, for all their support and always encouraging comments. Also, I want to thank Duy for helping me investigate the possible threading improvements in git-grep. And finally, I thank Junio and the community for the reviews and support.

Being part of GSoC on Git was amazing. I certainly want to keep contributing to Git with whatever I can :)

Footnotes

The test was made running time git --no-pager grep --color=never --threads=1 -E '(static|extern) (int|double) \*' HEAD on chromium’s repo⁴ 5 times and taking the means. The results were:
```
Total time:       10.3599s
Time in inflate:  20.3682s
Percentage:       49.14%
```
↩
Patch by Ævar Arnfjörð. I made some small changes and re-sent. ↩
Patch by Daniel Ferreira. I made some small changes and re-sent. ↩
chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019), after a ‘git gc’ execution. ↩ ↩²
https://groups.google.com/a/chromium.org/forum/#!topic/chromium-dev/oYe69KzyG_U ↩
The patches removing the add_to_alternates_memory() call from submodule-config.c::config_from_gitmodules():
- config: add repo_config_with_options()
- submodule-config: don’t add subrepo to alternates
↩