Note: I wrote a small follow-up to this post talking about what I’ve done in my project after GSoC has finished. You can check it here.
So, this has been the final week of GSoC. It passed so fast! Let’s look back, summarizing what was done, the current state of the project and what’s next.
You may also want to check out my weekly posts on the project, which can be found here.
The project
You can check my project proposal here. The central idea behind it was to allow threading to more Git commands by making the pack access code thread-safe. We initially planned to protect the pack access functions before working to parallelize any specific Git command. However, without a use of the pack access API in mind, it was difficult to know which snippets really required thread-safety. In fact, I could potentially be working on snippets that wouldn’t necessarily need to be called in parallel. Also, we weren’t able to fully test and validate the changes (or its real benefits).
So we changed the approach and decided to work on the pack access code from the
perspective of git-grep
. This command was already parallel, except when
grepping the index (with --cached
) or given trees. And the reason for that was
due to poor performance when the object store was involved, because it had to be
handled sequentially. So git-grep
was really showing itself as the perfect
candidate for the project.
Most of GSoC’s first half, I invested in profiling and code analysis (also in
my microproject, as I’ll talk about in the next section). I was trying to
validate the hypothesis that allowing parallel object reading would indeed
speedup git-grep
. Even more, I wanted to know which sections of object reading
would be more suitable for parallelization. And that’s when I started to
investigate zlib inflation. You can check some of the plots and analysis made
in this blog post.
With this patch, I measured the time
git-grep
spends on git_inflate()
alone. And in my test case1 it accounted
for 49.14%
of the total execution time! This finding defined which way the
project would go from there. From this point, we focused on allowing parallel
inflation and making other improvements to git-grep
’s thread mechanics.
The patches
As my first patch in the project, I tried to remove function-scope static variables from thread-unsafe pack access functions. We ended up not sending this patch as we changed the project approach, but you can check it here:
For my microproject, I worked improving the internal dir-iterator API and using it in place of readdir/opendir/closedir at builtin/clone.c (to copy or link objects in a local clone). Although I started this patchset before GSoC began, there were 8 iterations and the final version was only sent in the middle of July (it wasn’t so “micro” anymore haha). Here are v8’s patches:
- 1/10: clone: test for our behavior on odd objects/* content 2
- 2/10: clone: better handle symlinked files at .git/objects/
- 3/10: dir-iterator: add tests for dir-iterator API 3
- 4/10: dir-iterator: use warning_errno when possible
- 5/10: dir-iterator: refactor state machine model
- 6/10: dir-iterator: add flags parameter to dir_iterator_begin
- 7/10: clone: copy hidden paths at local clone
- 8/10: clone: extract function from copy_or_link_directory
- 9/10: clone: use dir-iterator to avoid explicit dir traversal
- 10/10: clone: replace strcmp by fspathcmp
All of the above were already merged into master
and were part of Git v2.23.0.
Working at git-grep, I fixed a bug which would make git grep --recurse-submodule
always grep the index for each submodule (whilst grepping the worktree should be
the default). Here is the final v3:
This was merged into master
(but not before v2.23.0, so it should be in the
next release).
The next patchset was the series where I added internal locks to object reading,
releasing it before inflation and reacquiring right after. This way we could
have thread-safe object reading with the benefit of parallel inflation. In the
same series, I also changed git-grep to use this lock instead of its own, which
lead to a 3.00x
speedup, as we’ll see later.
- 1/4: object-store: add lock to read_object_file_extended()
- 2/4: grep: allow locks to be enabled individually
- 3/4: grep: disable grep_read_mutex when possible
- 4/4: grep: re-enable threads in some non-worktree cases
These were sent and merged into pu
.
The above series re-enabled threading when grepping the index with a good
speedup, but it didn’t support some flags: --recurse-submodules
and
--textconv
. This was due to the thread-unsafeness of functions used when these
options are enabled. So I kept working on a version to protect them and allow
threads in these cases as well. The following patchset is still quite raw and
thus, not sent yet. But it’s possible to see the ideas involved:
- 1/10: grep: remove racy call to repo_clear()
- 2/10: grep: move parse_object_or_die() into critical section
- 3/10: object: add repo_parse_object_or_die()
- 4/10: grep: don’t add submodules to the alternates list
- 5/10: config: add repo_config_with_options()
- 6/10: submodule-config: don’t add subrepo to alternates
- 7/10: object-store: allow threaded access to object reading
- 8/10: grep: replace grep_read_mutex by obj_read_locks
- 9/10: grep: re-enable threads in some non-worktree cases
- 10/10: grep: move driver pre-load out of critical section
Note: You can check all the merged patches it the official repository, searching for “Matheus Tavares” in the respective branches. For example, here are the searches in master and pu.
Time Results
To see how well our optimizations to the cached git-grep have gone, I tested it at chromium’s repo4, mainly for two reasons:
- It has a big object store (around 15GB), which really makes some operations slow
- Chromium’s developers already reported some difficulties with slow Git commands5
The regexes used for the tests were:
- Regex 1:
abcd[02]
- Regex 2:
(static|extern) (int|double) \*
For each of them, the command git --no-pager grep --color=never
--threads=<THREADS> <REGEX> HEAD
was repeated 30 times after 2 warmup runs and
the average execution time was taken. This was done to mitigate system
fluctuations. All tests were executed on an i7-7700HQ with 16GB of RAM and SSD.
Finally, here are the results:
Threads | Regex 1 | Regex 2 |
---|---|---|
1 | 17.3557s | 20.8410s |
2 | 9.7170s | 11.2415s |
8 | 6.1723s | 6.9378s |
This represents a 3.00x
speedup, in the best case, which is very nice! But to
make sure the optimization also performs well on HDD and/or older machines, the
tests were repeated on an AMD Turion 64 X2 TL-62 (dual-core) with 4GB of RAM and
HDD (SATA-150, 5400 rpm):
Threads | Regex 1 | Regex 2 |
---|---|---|
1 | 40.3347s | 47.6173s |
2 | 27.6547s | 35.1797s |
And again, we got a quite good time reduction!
What’s next
We already have a good speedup, but the project isn’t finished yet. In this last
couple weeks, I’ve been struggling to allow threads when grepping the index
with --recurse-submodules
or --textconv
. The problem is that the function
calls involved in these options are not thread-safe yet and they might as well
conflict with the now protected object reading functions.
So, after GSoC, I plan to continue this project tackling this issue. I already have some drafts on how to do it. For example the last patchset on The Patches section (which wasn’t sent due to some yet present data races).
However, I’m planning a more incremental process, seeking the best performance gain without changing too much of the code at a time. Certainly, the solution won’t be ideal, but it should bring some improvement in the short term, and be much less risky.
So, what I have in mind is to use a single recursive mutex (which Git already supports for Linux and Windows), to protect both the object reading and submodules’ functions. With this, conflicting operations would be avoided through mutual exclusion and submodule’s functions would still be able to load objects without re-locking errors.
This will hopefully allow us to re-enable threadings with --recurse-submodules
getting a good speedup and no data races. The downside is that parallelizable
tasks such as object inflation on submodule’s initializations would be
serialized. But aiming for incremental changes, that’s an improvement we can
think of in the future. As my mentors suggested, I can also write a document at
Documentation/technical
explaining the parallel git-grep mechanics and how it
could be further improved.
Finally, there are some extra possible improvements to tackle at git-grep:
- Stop adding subrepo’s object directories to the in-memory alternates list.
(This is done at
builtin/grep.c::grep_submodule()
) - Move already thread-safe functions out of
git-grep
’s critical sections, for better performance. One example is the driver pre-loading atbuiltin/grep.c::add_work()
. - Protect lazy initializations and refine the big object reading lock.
- Once
git-grep
already supports threads with--recurse-submodules
, do the same for--textconv
. - [extra] Refactor the call stack originated at
submodule-config.c::config_from_gitmodules()
so that submodules don’t need to be added to the in-memory alternates list. This should also bring better performance to object reading operations. I already developed two patches6 during GSoC to tackle this issue, but I have to revisit and test them before sending.
Adjacent achievements
Because of GSoC and my work on Git, I also got the opportunity to do some additional related activities:
Callpath
Working on my project, I often needed to check if some thread-unsafe function could be present in a call stack or not. GNU cflow is a good alternative. But it only performs static analysis. So I wrote a simple script called callpath which uses GDB + dot to plot all the paths that lead to a specific function in an execution.
Talk at linuxdev-br
I got to present at the linuxdev-br conference, showing a little bit of Git’s code. Renato and I talked about object-oriented techniques in C using Linux and Git as study cases. I focused mostly on the dir-iterator API (which I worked on as my GSoC’s microproject). Our slides are available here and the lecture recording here. Bellow are some pictures of the talk (click to enlarge):
(Pictures by Eduardo Silva)
Git group at FLUSP
I’m part of FLUSP, a group of students at the University of São Paulo focused on contributing to FLOSS projects. This year, some of us started meeting weekly to exchange Git learnings and tips on how to contribute to it. We’ll hopefully see some patches from the group in the ML soon :)
Thanks
I want to thank my mentors, Christian and Olga, for all their support and always
encouraging comments. Also, I want to thank Duy for helping me investigate the
possible threading improvements in git-grep
. And finally, I thank Junio and
the community for the reviews and support.
Being part of GSoC on Git was amazing. I certainly want to keep contributing to Git with whatever I can :)
Footnotes
-
The test was made running
time git --no-pager grep --color=never --threads=1 -E '(static|extern) (int|double) \*' HEAD
on chromium’s repo4 5 times and taking the means. The results were:Total time: 10.3599s Time in inflate: 20.3682s Percentage: 49.14%
-
Patch by Ævar Arnfjörð. I made some small changes and re-sent. ↩
-
Patch by Daniel Ferreira. I made some small changes and re-sent. ↩
-
chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019), after a ‘git gc’ execution. ↩ ↩2
-
https://groups.google.com/a/chromium.org/forum/#!topic/chromium-dev/oYe69KzyG_U ↩
-
The patches removing the
add_to_alternates_memory()
call fromsubmodule-config.c::config_from_gitmodules()
:
Til next time,
Matheus