The git-grep –recurse-submodules fix
This week I finished and sent the patch fixing the worktree case in git-grep –recurse-submodules. After the reviews I also made some revisions. The last version can be seen here.
“Fixed” race conditions on textconv and submodules
Back to the parallel inflation patchset, the major leftover task was to protect the textconv and submodules operations on git-grep. My first approach to solve this was to incrementally navigate through the call stack looking for unprotected global states. As might be expected, this solution has proved too laborious, so I decided to try a more empirical one:
- Firstly, I prepared a big (but not too big) repository with lots of textconv
operations and a submodule of reasonable size as well. Basically I added Git’s
repository as a submodule of the Linux repo. And to force
git-grep
to perform a lot of textconv’s, I added these lines to.gitattributes
:*.c diff=cat-textconv *.h diff=cat-textconv
And this to my
~/.gitconfig
:[diff "cat-textconv"] textconv=cat
- Then I executed the following command looking for
helgrind
’s complains:valgrind --tool=helgrind ~/bin/git -C my-prepared-repo \ --no-pager grep --recurse-submodule --color=never \ --textconv --threads=8 abc[0-9] HEAD
- For each “
Possible data race
” message, I tried to examine the respective code snippet and protect that. To perform this, I used a lot of the code Duy used on the hacky patch he sent me as a testing example.
Being an empirical process, this is not the perfect solution as we may miss some
possible racy situations. But I did manage to get to a point with no warnings
from helgrind
. The resulting patchset can be seen here.
Please note that it still needs a lot of cleaning.
There are still two problems with it (at least):
- When using
--textconv
the execution time is hugely affected (like 2s to 5m!) - The thread protection at
use_pack()
must be refactored. Currently I’m using a singleuse_pack_mutex
arround the function’s code, but we should use thewindows_lock
added to thestruct packed_git
to increase the parallel performance. The problem with using it now is thathelgrind
warns us of possible race conditions on this line:win->last_used = pack_used_ctr++;
.
For the first one, I need to profile the code to find where I may have
introduced a bottleneck. The second one is weird to me as it seems a little
counterintuitive. As we have one mutex per packed_git
, the only possible way
of getting race conditions on that line would be if two calls to use_pack()
,
with different packed_git
parameters, were using the same window. But, by what
I understood so far, packed_git
is a representation of one packfile and each
window belong to an specific packfile. So the race condition doesn’t make sense
to me yet.
Other activities
This weekend I’ve attended linuxdev-br, an amazing conference that brings together developers of the linux kernel and other FLOSS projects. It was just wonderful! There were many great talks and welcoming people! I also got the chance to co-present two speeches: the first on “How to create a local FLOSS study group”; and the second on “Object-Oriented techniques in C: a case study on Linux and Git”. I’m very happy to be able to talk a little about the Git codebase and share some techniques used there :)
The conference also had a “workshop day” on Friday which we (FLUSP) helped co-organize at our university. In total, there were 5 workshops and almost 80 people. We also hosted one of the sessions, helping people who wanted to start contributing to the linux kernel. It was a lot of work, but very rewarding!
Next steps
- Write and run a battery of tests to:
- Ensure the current version still produces only correct results.
- Check the execution time in comparison to the previous version
(with the unprotected textconv and submodule operations). Also compare how
--recurse-submodules
affects the time in this new version.
- Profile the code from my patches and check why
--textconv
is running so slow. - Try to figure out why we cannot use the
windows_lock
ofstruct packed_git
inuse_pack()
without race conditions. - Refactor and clean the code. The second commit on my patchset is very inflated and, thus, should be splitted. I also need to consider destroying the introduced mutexes after use, if possible.
- Try to remove the
obj_read_lock
. Now that we have locks for the windows atpacked_git
and thepacked_git_mru
list ofraw_object_store
we may be closer to the possibility of removingobj_read_lock
. But, that won’t be so simple as we still need to protect thedelta_base_cache
and other structures.
Til next time,
Matheus