git, with abandon

I really enjoyed reading
The Biggest and Weirdest Commits in Linux Kernel Git History
this week. If you haven't read it yet, it describes some of the edge case merges in the Linux Kernel, including one octopus merge that has 66 parent commits.

the kernel developers are expert git users and tend to use its features with abandon

With abandon: To use git with abandon, one has to have already embraced it.

I've been teaching my open source students how to use git this term, and more, how to use it fearlessly. I used to teach it quickly, but through experience, I've slowed down my pacing, and I spend a lot more time on the concepts. I think it's important to feel in control of git, rather than constantly fearing for your safety as you copy and paste commands into your terminal without understanding what is going to happen. Git isn't easy, but it's not as hard as it seems when you first begin either: you just have to take your time and learn how to handle it.

In an effort to walk-the-talk I decided to take on a fairly large merge that I've been putting off. Two years ago I forked Adobe's Brackets repo, and started reworking it so it would run in a browser--we use it in Thimble. For a long time I kept rebasing our work on Adobe's master branch. But eventually, I decided to keep things stable on our side, and let the two repos drift apart. Since then we (Mozilla, Seneca, and I) have done ~500 commits on our fork. In the same period, Adobe went on to release 4 new versions of Brackets, and added 1,000+ new commits. In other words, our branches have diverged!

There hasn't been any pressing reason for me to update to what's upstream. Everything we want Brackets to do in Thimble, it does now, and does well. However, I also knew that I eventually wanted to get any bug fixes, updates to deps they use, and any new cool features.

So this week I decided to dive in. Because I'm also working on other bugs in the same tree, and had no idea how long it would take to fix the inevitable merge conflicts it would create, I decided to re-clone my repo and do it there. This removed a lot of pressure, since I could take as long as I needed without leaving my tree in a half-merged state. I think this was a great strategy, and I'd recommend it.

For the most part, the merge went smoothly. This was both surprising and expected. On the one hand, I knew that git could do this, but I also feared it might get really messy when the two lines of development came back together. In the end there were 40K new lines of code, and another 27K removed from ~700 files. And of these, there were only 45 files with merge conflicts. At first it felt like a lot, and not something I was going to enjoy. But by spending the better part of a day going through them all, I was able to sort everything out without too much trouble. Very few were huge problems, it was just work I had to do.

On reflection, I actually found the experience very rewarding. It was like reading a history of the development we'd done: every conflict was a feature or fix we'd layered on, and seeing them all in quick succession made the totality of the work come into focus.

I've had to carry big patches before, especially when doing new DOM features in Firefox and Gecko. It's not a ton of fun having to constantly update against a fast-moving, upstream target. I think what I learned with this merge is that you're almost better to just wait, and do it down the road. Since you're going to have to deal with the conflicts anyway, why not queue them up and do it all at the same time? I'm not convinced that dealing with all these merge conflicts spread out over weeks and months would have been any less work. It almost made it easier for me to do them as a set, since so many of them were related, often in subtle ways.

In terms of how I relied on git to get this done, beyond the expected stuff (add, commit, status, checkout, etc.) I relied on all of the following:

blame: to see who did what, and why (esp. when trying to find commits that made changes upstream I needed to understand)
show: to examine commits in their entirety
bisect: to track down regressions introduced by the merge. I knew something used to work and now doesn't, and I let git guide me to the problem commit. I've written about bisect before. It's one of the greatest tools available to you as a git user.
grep: to find things in my tree vs. in commits. I'm amazed how many people don't seem to know about this, I use it all the time, and it saved me hours of frustration with this merge.
diff: I often needed to understand how what we did had differed from upstream beyond a single commit.
log: I needed to follow the flow of a few commits at a time, and track things back to issues.
revert: I had to reverse some of the commits that were done upstream because they didn't make sense with what I was doing. I could have hand-edited the files, but it was a lot nicer to let git do it, and also record a commit that explained why I was doing it.
GitHub: I also found it extremely helpful to have two browser windows open in GitHub with each of the repos. Being able to quickly jump to files, comparing things across commits/branches, and being able to jump between code changes and discussions in issues/pull-requests, was all great.

The more you do big merges (and this wasn't even that big of a merge compared to some I've seen!) the more comfortable you'll become to make big changes and to branch off from the main line of development. Git was designed for this, and it's worth spending the time to become comfortable with what it can do.