Yesterday I got home after 10 days of DebConf. It was an amazing experience! I thought I would be able to work more in my project during the conference, but the schedule was very intense. Unfortunatelly, I also got a sore throat and fever for a few days :( I’m still recovering from it, but I could still enjoy most of the speeches, nevertheless.
Regarding the parallel zlib RFC, I couldn’t advance much during the conference.
But I managed to repeat the tests in an HDD machine, as we’ll discuss later in
this report. I also took some time to work on a potential bug at git-grep
. It
was reported earlier this month at the mailing list. And since it’s located in
the code I’m currently working on, I thought I should take a look and try to fix
it during GSoC, as well. Without further ado, let’s get to those topics:
Repeating tests using HDD
I finally repeated the tests from week-8 in an machine1 with HDD to check if we have similar gains to those seen in SSD. Here you can see the means of 30 executions after 2 warmups:
Times of git-grep at chromium repo (with HDD)
==============================
Threads 1
Regex 1 Regex 2
Original: 18.4512s 22.5032s
Threads 2
Regex 1 Regex 2
Original: 19.3351s 19.9966s
Parallel inflate: 12.4539s 14.6662s
Reduction: 35.59% 26.66%
Threads 8
Regex 1 Regex 2
Original: 22.7605s 20.8892s
Parallel inflate: 10.7737s 12.6165s
Reduction: 52.66% 39.60%
As we can see, the time reductions here are pretty good, as well. So it seems that we are in the right direction :)
Grep submodules bug: ignoring worktree
On July 8th, Daniel reported
on the mailing list a weird git-grep
behavior regarding submodules. Even when
--cached
was omitted, git-grep --recurse-submodules
was only greping the
submodules’ objects and not the worktree. As a simple test, I did the following:
cd /tmp
mkdir sub parent
cd sub
git init
echo "A" > A
git add A && git commit -m "Add A"
cd ../parent
git init
echo "parent-A" > parent-A
git add parent-A && git commit -m "Add parent-A"
git submodule add ../sub
git add . && git commit -m "Add submodule 'sub'"
echo "Adding a string containing 'parent'" >> sub/A
git grep --recurse-submodules 'parent'
The output was:
parent-A:1:parent-A
And it should have been:
parent-A:1:parent-A
sub/A:2:Adding a string containing 'parent'
It seems that git-grep was taking the worktree modifications into account before f9ee2fc (“grep: recurse in-process using ‘struct repository’”, 02-08-2017). So it could have been either a project decision to change the behavior or a problem during the conversion to recurse in-process. Analysing the code and commit history, it seems, thought, that the second option is what happened. So I focused on trying to find a way to re-implement the old behavior.
Althought not obvious to me at first look, the fix wasn’t complex at all.
Basically, the problem was at this line inside grep_submodule()
:
hit = grep_cache(&subopt, pathspec, 1);
grep_cache()
’s third parameter is called cached
. It tells the function
whether greping must be performed in the cache or worktree. And at the code
above, grep_cache()
is always been called with cached=1
, thus, ignoring the
worktree. What I did to fix that is add an aditional cached
parameter to
grep_submodule()
so that it may known the desired behavior and call
grep_cache()
properly. I also added a test to detect possible behavior changes
regarding this case in the future. The final commits can be seen here.
I will ask for a review from my mentors and send it to the mailing list when
ready :)
This small patchset was also a nice learning opportunity for me to get to know more about the index file and how it’s maintained. Here are some references that may be useful to others as well:
- About the index: https://hackernoon.com/understanding-git-index-4821a0765cf
- About general git’s internals and others: https://www.youtube.com/watch?v=xbLVvrb2-fY (really good talk!)
- It’s possible to inspect the index with
git ls-files --stage
.
Other activities
At DebConf I had the opportunity to pass on some of what I learned about Git and Linux so far. In particular, I helped two guys from India who got interested in contributing to those projects. We set the environments and discussed a little about the codebases and tests. For the kernel, there was even one patch already sent! You can check it here.
Next steps
For this week, I plan to send the patchset fixing the submodules’ worktree case on git-grep and continue the work on parallel inflation. The main tasks for that are the ones listed on the previous post.
Footnotes
-
An Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz, with 8GB of RAM, HDD and running Manjaro Linux (the same OS where previous tests were runned). ↩
Til next time,
Matheus