Adventures in Colemak

01 Jul 2017

Without much warning I recently decided to learn Colemak.

What?

Colemak is an alternative layout for keyboards. It aims to improve on both the traditional QWERTY and the only slightly better-known Dvorak by placing the commonest keys on the home row, along with certain other considerations, to improve ergonomics and comfort while typing.

Why?

This came as a bit of a surprise to me as I have always felt somewhat opposed to learning a new keyboard layout. This may have stemmed from my own frustration in the past in doubling on Clarinet and Saxophone. While the two are keyed similarly, they correspond to different “notes” as they are written down. Though it is very common for people to do this, I really don’t enjoy the feeling of disorientation at all.

The drawbacks I identified as:

  • the initial effort of learning
  • having to “double” when confronted with a QWERTY keyboard
  • really, having to collaborate with anyone on anything ever again

The supposed benefits of faster typing speed and prevention of RSI I never saw as a net gain. Which is not to say that I don’t care about those things (I take injury prevention very seriously, having blogged about this before). It’s just such an inexact science that I would welcome both of those benefits if they came, but couldn’t reasonably expect them as guaranteed.

But I think there was one other factor that has completely swung this for me that has probably not been present at any other time that I’ve been thinking about this. It is that I am incredibly bored. So bored that I don’t want to learn anything exciting like a new programming language, or even a new natural language, or how to ride a unicycle or spin poi. I’ve been craving the dull repetition that I’ve felt as a musician, a quiet confidence that if I do this dance with my hands slowly and correctly enough times, I’ll program myself to perform a new trick. I’ve been actually longing for the brain ache you get when you’re trying to do something different and your muscle memory won’t quit.

How?

There are many of these online, but I found The Typing Cat particularly good in getting started out. Not wanting to take the plunge straight away, this let me emulate the new layout while I went through the exercises, preserving QWERTY for everything else. For the first couple of weeks I’d do QWERTY during the day and practice 1-2 hours of Colemak in the evening, until I got up to an acceptable typing speed (for me, 30 wpm, while still very slow, would not interfere too much).

Once I was ready to take the leap, I was confronted by a great number of ways to do this, ranging from reconfiguring the keyboard at the system level (useless, since X ignores it), configuring X from the command line (annoying, because those changes aren’t preserved when I make any customizations in the Gnome Tweak Tool), to discovering I could do most of this by adjusting settings in the UI. I’ll describe only what I eventually settled on in detail, in case you are trying to do this yourself and are running a similar setup to me (Debian 9/Stretch, Gnome 3, US keyboard).

To set up Colemak, simply open Settings, go to Region & Language, hit the + under Input Sources, click English then English (Colemak) and you’re done. You should now see a new thing on the top right that you can click on and select the input source you wish to use. You can also rotate input sources by hitting Super (aka Windows key) and Space.

Unfortunately I wasn’t done there because I had a few issues with some of the design choices in the only variant of Colemak offered. Namely, I didn’t want Colemak to reassign my Caps Lock key to Backspace (as I was already reassigning it to Escape), and I wanted to use my right Alt key as Meta, something I use all the time in Emacs and pretty much everything that supports the basic Emacs keybindings (see: everything worth using). While there may have been a way to customize this from the command line, I never found out what that was, and besides I wanted to find a solution that jelled as much as possible with the general solution I’ve outlined above. It was with this spirit that I decided to add my own, customized keyboard layout. If you’re having similar grumbles, read on.

First, a word of caution. You’re going to have to edit some configuration files that live in /usr/share. If that makes you queasy, I understand. I don’t especially love this solution, but I think it is the best of all solutions known to me. Either way, as a precautionary measure, I’d go ahead and backup the files we’re going to touch:

sudo cp /usr/share/X11/xkb/symbols/us{,.backup}
sudo cp /usr/share/X11/xkb/rules/evdev.xml{,.backup}

Next we’re going to add a keyboard layout to the /usr/share/X11/xkb/symbols/us file. It’ll be an edited version of the X.Org configuration which you can find here. It can probably go anywhere, but I inserted it immediately after the existing entry for Colemak:

// /usr/share/X11/xkb/symbols/us

partial alphanumeric_keys
xkb_symbols "colemak-custom" {

    include "us"
    name[Group1]= "English (Colemak Custom)";

    key <TLDE> { [        grave,   asciitilde ] };
    key <AE01> { [            1,       exclam ] };
    key <AE02> { [            2,           at ] };
    key <AE03> { [            3,   numbersign ] };
    key <AE04> { [            4,       dollar ] };
    key <AE05> { [            5,      percent ] };
    key <AE06> { [            6,  asciicircum ] };
    key <AE07> { [            7,    ampersand ] };
    key <AE08> { [            8,     asterisk ] };
    key <AE09> { [            9,    parenleft ] };
    key <AE10> { [            0,   parenright ] };
    key <AE11> { [        minus,   underscore ] };
    key <AE12> { [        equal,         plus ] };

    key <AD01> { [            q,            Q ] };
    key <AD02> { [            w,            W ] };
    key <AD03> { [            f,            F ] };
    key <AD04> { [            p,            P ] };
    key <AD05> { [            g,            G ] };
    key <AD06> { [            j,            J ] };
    key <AD07> { [            l,            L ] };
    key <AD08> { [            u,            U ] };
    key <AD09> { [            y,            Y ] };
    key <AD10> { [    semicolon,        colon ] };
    key <AD11> { [  bracketleft,    braceleft ] };
    key <AD12> { [ bracketright,   braceright ] };
    key <BKSL> { [    backslash,          bar ] };

    key <AC01> { [            a,            A ] };
    key <AC02> { [            r,            R ] };
    key <AC03> { [            s,            S ] };
    key <AC04> { [            t,            T ] };
    key <AC05> { [            d,            D ] };
    key <AC06> { [            h,            H ] };
    key <AC07> { [            n,            N ] };
    key <AC08> { [            e,            E ] };
    key <AC09> { [            i,            I ] };
    key <AC10> { [            o,            O ] };
    key <AC11> { [   apostrophe,     quotedbl ] };

    key <AB01> { [            z,            Z ] };
    key <AB02> { [            x,            X ] };
    key <AB03> { [            c,            C ] };
    key <AB04> { [            v,            V ] };
    key <AB05> { [            b,            B ] };
    key <AB06> { [            k,            K ] };
    key <AB07> { [            m,            M ] };
    key <AB08> { [        comma,         less ] };
    key <AB09> { [       period,      greater ] };
    key <AB10> { [        slash,     question ] };

    key <LSGT> { [        minus,   underscore ] };
    key <SPCE> { [        space,        space ] };
};

Next you need to register it as a variant of the US keyboard layout:

<!-- /usr/share/X11/xkb/rules/evdev.xml -->
<xkbConfigRegistry version="1.1">
  <!-- ... -->
  <layoutList>
    <layout>
      <!-- ... -->
      <configItem>
        <name>us</name>
        <!-- ... -->
      </configItem>
      <variantList>
        <!-- Insert this stuff =-> -->
        <variant>
          <configItem>
            <name>colemak-custom</name>
            <description>English (Colemak Custom)</description>
          </configItem>
        </variant>

Finally, you’ll need to bust the xkb cache. I read about how to do this here, but it didn’t seem to work for me (most likely differences between Ubuntu and Debian, or different versions). So to prevent giving you the same disappointment, I’m going to tell you the best way to get this done that is sure to work: restart your damn computer. If you can figure out a better way, that’s great.

Having done all the above, you should now be able to select your Colemak (Custom) layout in the same way by going through the settings in the UI.

Since I’ve made the switch, I’ve seen my speed steadily increasing up to 50-60 wpm. That’s still kind of slow for me, but I have every confidence that it will continue to increase. I think doing drills has helped with that. Since I have no need for emulation anymore, I’ve found the CLI utility gtypist to be particularly good. I try to do the “Lesson C16/Frequent Words” exercises for Colemak every day.


Factories aren't Fixtures

20 Feb 2017

As someone who learned both to program and to test for the first time with Rails, I was quickly exposed to a lot of opinions about testing at once, with a lot of hand-waving. One of these was, as I remember it, that Rails tests with fixtures by default, that fixtures are problematic, that Factory Girl is a solution to those problems, so we just use Factory Girl. I probably internalized this at the time as “use Factory Girl to build objects in tests” without really questioning why.

Some years later now, I sincerely regret not learning to use fixtures first, to experience those pains for myself (or not), to find out to what problem exactly Factory Girl was a solution. For, I’ve come to discover, Factory Girl doesn’t prevent you from having some of the same issues that you’d find with fixtures.

To understand this a bit better, let’s do a simple refactoring from fixtures to factories to demonstrate what problems we are solving along the way.

Consider the following:

# app/models/user.rb
class User < ApplicationRecord
  validates :name, presence: true
  validates :date_of_birth, presence: true

  def adult?
    date_of_birth + 21.years >= Date.today
  end
end
# spec/fixtures/users.yml
Alice:
  name: "Alice"
  date_of_birth: <%= 21.years.ago %>
Bob:
  name: "Bob"
  date_of_birth: <%= 21.years.ago - 1.day %>
# spec/models/user_spec.rb
specify "a person of > 21 years is an adult" do
  user = users(:Alice)
  expect(user).to be_adult
end

specify "a person of < 21 years is not an adult" do
  user = users(:Bob)
  expect(user).not_to be_adult
end

Here we have two fixtures that contrast two different kinds of user. If done well, your fixtures will be a set of objects that live in the database that together weave a kind of narrative that is revealed in tiny installments through your unit tests. Elsewhere in our test suite, we’d continue with this knowledge that Alice is an adult and Bob is a minor.

So what’s the problem? Well, one is what Meszaros calls the “mystery guest”, a kind of “obscure test” smell. What that means is that the main players in our tests - Alice and Bob, are defined far off in the spec/fixtures/users.yml file. Just looking at the test body, it’s hard to know exactly what it was about Alice and Bob that made one an adult, the other not. (Sure, we should know the rules about adulthood in whatever country we’re in, but it’s easy to see how a slightly more complicated example might not be so clear).

Let’s try to address that concern head on by removing the fixtures:

# spec/models/user_spec.rb
specify "a person of > 21 years is an adult" do
  user = User.create!(name: "Alice", date_of_birth: 21.years.ago)
  expect(user).to be_adult
end

specify "a person of < 21 years is not an adult" do
  user = User.create!(name: "Bob", date_of_birth: 21.years.ago - 1.day)
  expect(user).not_to be_adult
end

We’ve solved the mystery guest problem! Now we can see at a glance what the relationship is between the attributes of each user and the behavior exhibited by them.

Unfortunately, we have a new problem. Because a user requires a :name attribute, we have to specify a name in order to build a valid user object in each test (we might in certain instances be able to get away with using invalid objects, but it is probably not a good idea). Here, the fact that we’ve had to give our users names has given us another obscure test smell - we have introduced some noise in that it’s not clear at a glance which attributes were relevant to the behavior that’s getting exercised.

Another problem that might emerge is if we added a new attribute to User that was validated against - every test that builds a user could fail for reasons that could be wholly unrelated to the behavior they are trying to exercise.

Let’s try this again, extracting out a factory method:

# spec/models/user_spec.rb
specify "a person of > 21 years is an adult" do
  user = create_user(date_of_birth: 21.years.ago)
  expect(user).to be_adult
end

specify "a person of < 21 years is not an adult" do
  user = create_user(date_of_birth: 21.years.ago - 1.day)
  expect(user).not_to be_adult
end

def create_user(attributes = {})
  User.create!({name: "Alice", date_of_birth: 30.years.ago}.merge(attributes))
end

Problem solved! We have some sensible defaults in the factory method, meaning that we don’t have to specify attributes that are not relevant in every test, and we’ve overridden the one that we’re testing - date_of_birth - in those tests on adulthood. If new validations are added, we have one place to update to make our tests pass again.

I’m going to pause here for some reflection before we complete our refactoring. There is another thing that I regret about the way I learned to test. And it is simply not using my own factory methods as I have above, before finding out what problem Factory Girl was trying to address with doing that. Nothing about the code above strikes me yet as needing a custom DSL, or a gem to extract. Ruby already does a great job of making this stuff easy.

Sure, the above is a deliberately simple and contrived example. If we find ourselves doing more complicated logic inside a factory method, maybe a well-maintained and feature-rich gem such as Factory Girl can help us there. Let’s assume that we’ve reached that point and plough on so we can complete the refactoring.

# spec/factories/user.rb
FactoryGirl.define do
  factory :user do
    name "Alice"
    date_of_birth 30.years.ago
  end
end
# spec/models/user_spec.rb
specify "a person of > 21 years is an adult" do
  user = create(:user, date_of_birth: 21.years.ago)
  expect(user).to be_adult
end

specify "a person of < 21 years is not an adult" do
  user = create(:user, date_of_birth: 21.years.ago - 1.day)
  expect(user).not_to be_adult
end

This is fine. Our tests look pretty much the same as before, but instead of a factory method we have a Factory Girl factory. We haven’t solved any immediate problems in this last step, but if our User model gets more complicated to set up, Factory Girl will be there with lots more features for handling just about anything we might want to throw at it.

It seems clear to me now that the problem that Factory Girl solved wasn’t anything to do with fixtures, since it’s straightforward to create your own factory methods. It was presumably the problem of having cumbersome factory methods that you had to write yourself.

However. This is not quite the end of the story for some folks, and that there’s a further refactoring we can seize upon:

# spec/factories/user.rb
FactoryGirl.define do
  factory :user do
    name "Alice"
    date_of_birth 30.years.ago

    trait :adult do
      date_of_birth 21.years.ago
    end

    trait :minor do
      date_of_birth 21.years.ago - 1.day
    end
  end
end
# spec/models/user_spec.rb
specify "a person of > 21 years is an adult" do
  user = create(:user, :adult)
  expect(user).to be_adult
end

specify "a person of < 21 years is not an adult" do
  user = create(:user, :minor)
  expect(user).not_to be_adult
end

Here, we’ve used Factory Girl’s traits API to define what it means to be both an adult and a minor in the factory itself, so if we ever have to use that concept again the knowledge for how to do that is contained in one place. Well done to us!

But hang on. Haven’t we just reintroduced the mystery guest smell that we were trying so hard to get away from? You might observe that these tests look fundamentally the same as the ones that we started out with.

Used in this way, factories are just a different kind of shared fixture. We have the same drawback of having test obscurity, and we’ve taken the penalty of slower tests because these objects have to be built afresh for every single example. What was the point?

Okay, okay. Traits are more of an advanced feature in Factory Girl. They might be useful, but they don’t solve any problems that we have at this point. How about we just keep things simple:

# spec/factories/user.rb
FactoryGirl.define do
  factory :user do
    name "Alice"
    date_of_birth 30.years.ago
  end
end
# spec/models/user_spec.rb
it "tests adulthood" do
  user = create(:user)
  expect(user).to be_adult
end

This example is actually worse, and is quite a popular anti-pattern. An obvious problem is that if I needed to change one of the factory default values, tests are going to break, which should never happen. The goal of factories is to build an object that passes validation with the minimum number of required attributes, so you don’t have to keep specifying every required attribute in every single test you write. But if you’re depending on the specific value of any of those attributes set in the factory in your test, you’re Doing It Wrong ™️.

You’ll also notice that the test provides little value in not testing around the edges (in this case dates of birth around 21 years ago).

Let’s compare with our earlier example (the one before things started to go wrong):

# spec/factories/user.rb
FactoryGirl.define do
  factory :user do
    name "Alice"
    date_of_birth 30.years.ago
  end
end
# spec/models/user_spec.rb
specify "a person of > 21 years is an adult" do
  user = create(:user, date_of_birth: 21.years.ago)
  expect(user).to be_adult
end

specify "a person of < 21 years is not an adult" do
  user = create(:user, date_of_birth: 21.years.ago - 1.day)
  expect(user).not_to be_adult
end

Crucially we don’t use the default date_of_birth value in any of our tests that exercise it. This means that if I changed the default value to literally anything else that still resulted in a valid user object, my tests would still pass. By using specific values for date_of_birth around the edge of adulthood, I know that I have better tests. And by providing those values in the test body, I can see the direct relationship between those values and the behavior exercised.

Like a lot of sharp tools in Ruby, Factory Girl is rich with features that are very powerful and expressive. But in my opinion, its more advanced features are prone to overuse. It’s also easy to confuse Factory Girl for a library for creating shared fixtures - Rails already comes with one, and it’s better at doing that. Neither of these are faults of Factory Girl, rather I believe they are faults in the way we teach testing.

So don’t use Factory Girl to create shared fixtures - if that’s the style you like then you may want to consider going back to Rails’ fixtures instead.


Testing JSON APIs with RSpec Composable Matchers

01 Aug 2016

Testing JSON structures with arbitarily deep nesting can be hard. Fortunately RSpec comes with some lesser-known composable matchers that not only make for some very readable expectations but can be built up quite arbitrarily too, mirroring the structure of your JSON. They can provide you with a single expectation on your response body that is diffable and will give you a pretty decent report on what failed.

While I don’t necessarily recommend you test every aspect of your API through full-stack request specs, you are probably going to have to write a few of them, and they can be painful to write. Fortunately RSpec offers a few ways to make your life easier.

First, though, I’d like to touch on a couple of other things I do when writing request specs to get the best possible experience when working with these slow, highly integrated tests.

Order of expectations

Because request specs are expensive, you’ll often want to combine a few expectations into a single example if they are essentially testing the same behavior. You’ll commonly see expectations on the response body, headers and status within a single test. If you do this, however, it’s important to bear in mind that the first expectation to fail will short circuit the others by default. So you’ll want to put the expectations that provide the best feedback on what went wrong first. I’ve found the expectation on the status to be least useful, so always put this last. I’m usually most interested in the response body, so I’ll put that first.

Using failure aggregation

One way to get around the expectation order problem is to use failure aggregation, a feature first introduced in RSpec 3.3. Examples that are configured to aggregate failures will execute all the expectations and report on all the failures so you aren’t stuck with just the rather opaque “expected 200, got 500”. You can enable this in a few ways, including in the example itself:

it "will report on both these expectations should they fail", aggregate_failures: true do
  expect(response.parsed_body).to eq("foo" => "bar")
  expect(response).to have_http_status(:ok)
end

Or in your RSpec configuration. Here’s how to enable it for all your API specs:

# spec/rails_helper.rb

RSpec.configure do |c|
  c.define_derived_metadata(:file_path => %r{spec/api}) do |meta|
    meta[:aggregate_failures] = true
  end
end

Using response.parsed_body

Since I’ve been testing APIs I’ve always written my own JSON parsing helper. But in version 5.0.0.beta3 Rails added a method to the response object to do this for you. You’ll see me using response.parsed_body throughout the examples below.

Using RSpec composable matchers to test nested structures

I’ve outlined a few common scenarios below, indicating which matchers to use when they come up.

Use eq when you want to verify everything

expected = {
  "data" => [
    {
      "type" => "posts",
      "id" => "1",
      "attributes" => {
        "title" => "Post the first"
      },
      "links" => {
        "self" => "http://example.com/posts/1"
      }
    }
  ]
  "links" => {
    "self" => "http://example.com/posts",
    "next" => "http://example.com/posts?page[offset]=2",
    "last" => "http://example.com/posts?page[offset]=10"
  }
  "included" => [
    {
      "type" => "comments",
      "id" => "1",
      "attributes" => {
      "body" => "Comment the first"
      },
      "relationships" => {
        "author" => {
          "data" => { "type" => "people", "id" => "2" }
        }
      },
      "links" => {
        "self" => "http://example.com/comments/1"
      }
    }
  ]
}
expect(response.parsed_body).to eq(expected)

Not a composable matcher, but shown here to contrast with the examples that follow. I typically don’t want to use this - it can make for some painfully long-winded tests. If I wanted to check every aspect of the serialization, I’d probably want to write a unit test on the serializer anyway. Most of the time I just want to check that a few things are there in the response body.


Use match when you want to be more flexible

expected = {
  "data" => kind_of(Array),
  "links" => kind_of(Hash),
  "included" => anything
}
expect(response.parsed_body).to match(expected)

match is a bit fuzzier than eq, but not as fuzzy as include (below). match verifies that the expected values are not only correct but also that they are sufficient - any superfluous attributes will fail the above example.

Note that match allows us to start composing expectations out of other matchers such as kind_of and anything (see below), something we couldn’t do with eq.


Use include/a_hash_including when you want to verify certain key/value pairs, but not all

expected = {
  "data" => [
    a_hash_including(
      "attributes" => a_hash_including(
        "title" => "Post the first"
      )
    )
  ]
}
expect(response.parsed_body).to include(expected)

include is similar to match but doesn’t care about superfluous attributes. As we’ll see, it’s incredibly flexible and is my go-to matcher for testing JSON APIs.

a_hash_including is just an alias for include added for readability. It will probably make most sense to use include at the top level, and a_hash_including for things inside it, as above.


Use include/a_hash_including when you want to verify certain keys are present

expect(response.parsed_body).to include("links", "data", "included")

The include matcher will happily take a list of keys instead of key/value pairs.


Use a hash literal when you want to verify everything at that level

expected = {
  "data" => [
    {
      "type" => "posts",
      "id" => "1",
      "attributes" => {
        "title" => "Post the first"
      },
      "links" => {
        "self" => "http://example.com/posts/1"
      }
    }
  ]
}
expect(response.parsed_body).to include(expected)

Here we only care about the root node "data" since we are using the include matcher, but want to verify everything explicitly under it.


Use a_collection_containing_exactly when you have an array, but can’t determine the order of elements

expected = {
  "data" => a_collection_containing_exactly(
    a_hash_including("id" => "1"),
    a_hash_including("id" => "2")
  )
}
expect(response.parsed_body).to include(expected)


Use a_collection_including when you have an array, but don’t care about all the elements

expected = {
  "data" => a_collection_including(
    a_hash_including("id" => "1"),
    a_hash_including("id" => "2")
  )
}
expect(response.parsed_body).to include(expected)

Guess what? a_collection_including is just another alias for the incredibly flexible include, but can be used to indicate an array for expressiveness.


Use an array literal when you care about the order of elements

expected = {
  "data" => [
    a_hash_including("id" => "1"),
    a_hash_including("id" => "2")
  ]
}
expect(response.parsed_body).to include(expected)


Use all when you want to verify that each thing in a collection conforms to a certain structure

expected = {
  "data" => all(a_hash_including("type" => "posts"))
}
expect(response.parsed_body).to include(expected)

Here we don’t have to say how many elements "data" contains, but we do want to make sure they all have some things in common.


Use anything when you don’t care about some of the values, but do care about the keys

expected = {
  "data" => [
    {
      "type" => "posts",
      "id" => "1",
      "attributes" => {
        "title" => "Post the first"
      },
      "links" => {
        "self" => "http://example.com/posts/1"
      }
    }
  ]
  "links" => anything,
  "included" => anything
}
expect(response.parsed_body).to match(expected)


Use a_string_matching when you want to verify part of a string value, but don’t care about the rest

expected = {
  "links" => a_hash_including(
    "self" => a_string_matching(%r{/posts})
  )
}
expect(response.parsed_body).to include(expected)

Yep, another alias for include.


Use kind_of if you care about the type, but not the content

expected = {
  "data" => [
    a_hash_including(
      "id" => kind_of(String)
    )
  ]
}
expect(response.parsed_body).to include(expected)


That’s about it! Composable matchers are one of my favorite things about RSpec. I hope you will love them too!


The Hobgoblin

20 Jun 2016

For the uninitiated, The Moomins is a series of books and a comic strip by the wonderful Tove Jansson. These Moomins live in the fictional and idyllic Moominvalley set somewhere in the forests of Finland. It is a complex landscape rich in imagery, symbolism, archetypes, and their world has been reimagined many times since Jansson first wrote about it. One such of these was Moomin, a show from the 90s that fused the best of this Finnish folklore with zaney Japanese animation. And it is my favorite from childhood.

So enamored was I with this show that I continue to watch it unironically to this day, and not just for the feeling of nostalgia. Though it is full of action and occasionally disturbing (the groke!), I nonetheless find it really calming to lose myself in the otherwise zen-like serenity of Moominvalley for 20 minutes or so.

Adventure is of course central to every episode, and sure enough the Moomins meet lots of interesting and occasionally magical creatures, and one of these is The Hobgoblin.

I didn’t remember much about the Hobgoblin from childhood, but I was struck watching it more recently with the following:

  • He is a powerful magician.
  • He collects Rubies.
  • He is in search of the King’s Ruby.
  • He rides a puma through the sky.

I am so surprised the Ruby community has not picked up on this yet!


Intermediate Git

24 Apr 2015

Git 101

My early professional career required that I knew how to do six things in git: branch, stage, commit, merge, push and pull. Beyond that there was always google. And of course that stack overflow page that everyone stumbles on eventually: if I effed something up there was git reset --hard HEAD, and if I really effed it up I could do git reset --hard HEAD~. Or was it the other way round?

To my surprise now, I got a lot of leverage out of just those six (or seven) commands. But that was probably because no-one else really minded what I was doing. We committed to master and dealt with problems as they came up. No-one read the history. We pushed to a gitolite server, which, as great as that is, is so far away from the world of GitHub that to any novice it was something of a black box. Code got committed and pushed. Who knows what happened after that? If something broke, it meant doing more committing and pushing.

Fortunately for me this didn’t last for too long. I decided at some point that I needed to understand git a little better.

Now, I still don’t consider myself an expert in any way. I did give a talk on the subject at work recently which I enjoyed, and wanted to summarize more formally the contents of that here. So here it is. Something like the guide I wish I had read a couple years ago to get me through the git 101 blues. It will cover:

  • Some standard and some not-so-standard terms
  • How to write a better commit message (that old chestnut)
  • How to make better commits
  • Some ways to configure git to make your life easier
  • What the hell rebasing is
  • A few odd parts of git’s syntax
  • Some lesser-used tools that you might like

Terms!

First of all, let’s define a few terms. I won’t define every term, just a few that are either vague or that I will use frequently throughout.

Private branch
A branch that is used by just you. Pushing it to a remote does not necessarily make it public.
Public branch
A branch that is shared (read: committed to) by many.
HEAD
I always wondered if you were supposed to scream this. I might less formally refer to it as simply as the 'head' or 'tip'. It is simply the current revision of a given branch.
The graph

A lot can be said about the graph, and it's probably beyond the scope of this article to talk about this in any detail. Let's just say that a requirement for understanding git's internals is some rudimentary knowledge about graph theory. I really do mean rudimentary, so don't let that put you off. There is a great resource on explaining git in terms of graph theory here, which I would highly recommend.

In terms of graph theory, your git history is essentially a graph composed of commit 'nodes'. The commits at the HEAD of branches are your 'leaf' nodes. Your current revision in this sense refers to the series of changes (i.e. Commit nodes) that are 'reachable' (i.e. Pointed to by HEAD, or pointed to by commits that are pointed to by HEAD, and on and on).

Merge bubble
When you merge two branches, you will get a merge 'bubble' by creating a new commit in the target branch that retains the integrity of both branches. This is a special 'merge commit', and it's special because it points to two different commits in the history - the tip of the target (typically `master`) branch, and the tip of the topic branch. You wouldn't create this commit by hand, it will happen automatically depending on how you've set up your `.gitconfig`. Typically, if you're working on a team and you haven't configured git at all, or if you're using the github web interface to merge branches, you will end up with lots of merge bubbles.
Fast-forward
This is what happens when you merge without creating a merge bubble. Git will merge your changes in at the top of your target branch as if you had just been committing to it all along. No merge commit is created.
Squashing
This is a technique used for combining commits that have already been made into bigger, more consolidated ones.

Some committing anti-patterns

Here are some things that can generally go wrong:

  • Putting everything into one big commit
  • Writing an incomplete commit message
  • Breaking something. Committing. Fixing it later.
  • (More advanced) rebasing or committing in hunks without checking the state of each commit

One thing I learned early on was that it is a good idea to commit frequently. Unfortunately that’s not the whole story. Although it does address anti-pattern #1, it will often mean trading it for #2 or #3. Practicing TDD is actually conducive to making frequent, small commits because you’re concentrating on either getting to green (a requirement for a good commit) without getting distracted or writing more code than is needed, or refactoring in small steps. Essentially, it’s OK to do #2 or #3 as long as you’re working in a private branch and you squash or rewrite your commits before merging by performing an interactive rebase (more on this later).

Squashing everything isn’t necessarily a good idea either. The goal should be to be left with a small number of commits that each mark a distinct progression toward some goal (adding a new feature, refactoring, etc.). As you become more savvy with rebasing interactively you may fall prey to antipattern #4. In other words, when you’re rewriting history it’s important to check the integrity of each commit that you’re creating after the fact. If you really care about your history, and not just your HEAD, you’ll want every commit to be green and deployable.

There are actually a few reasons why you might want to take such care of your history. The first that comes to mind is being able to use git’s bisect feature with more confidence. bisect is a tool used for examing a portion of your history, typically for locating a commit that introduced some regression. It is a very powerful and useful tool that I’ve personally seen rendered completely useless by careless committing. More on bisect later.

Another reason might be being able to generate metrics for your application across a range of commits.

Another is simply being able to read your history with relative ease. This is more a comment on composing good commits with good commit messages. (Occasionally, for inspiration, I’ll go spelunking through the history of some open source software that I love, go right back to the first commit and rediscover the steps of creating its first complete feature.)

There are two rules I like to follow when composing a commit message. The first is to use the present tense imperative in the first line. The reason for this is that this is the tense/mood used in git’s generated messages such as on merge commits. A nice side effect of this is that you will probably find that your messages are shorter and succinter. The second rule is never to use the -m flag. Trying to fit your entire message onto the first line is just way too much pressure! How formal you want to get with your message after that is up to you. Generally it’s a good idea to have a short, descriptive first line, followed by a longer description and a link to an issue number or ticket if one exists. I add thoughtbot’s template to help remind me:

# ~/.gitconfig
[commit]
  template = ~/.gitmessage
# ~/.gitmessage


# 50-character subject line
#
# 72-character wrapped longer description. This should answer:
#
# * Why was this change necessary?
# * How does it address the problem?
# * Are there any side effects?
#
# Include a link to the ticket, if any.

More on your gitconfig

There are a couple more things that you may want to consider adding or tweaking in your gitconfig. Often you’ll see official advice telling you to use the git command line interface to accomplish this, but I prefer to edit my ~/.gitconfig by hand.

Here are a few things I recommend playing with:

[alias]
  a = add
  br = branch
  ci = commit
  co = checkout
  st = status

These are a few simple and common aliases that have become more or less standard (see that kernel wiki article for others). I won’t enumerate all the ones I use here, but feel free to check out my dotfiles. Aliasing is essential to being productive if you’re interacting with git at the command line. Feel free to create aliases in your ~/.bashrc too. Alias git to g, and more common commands such as git status to gs. It might seem trivial at first, but if you type git status about 200 times a day as do I, you are going to be saving quite a few keystrokes by the end of the week. And that’s time you could be spending thinking about your design, or even going for a walk in the park.

[merge]
  ff = only

This is useful if you don’t want git to create a merge bubble unless specifically asked to do so. If your branch can’t be fast-forwarded, it won’t be merged either until you rebase, or you pass a flag overriding the above.

[branch]
  autosetuprebase = always

Useful if you are using a rebase-style workflow (more below). With this set, if you pull from an upstream on a branch where you have revisions that have not yet been pushed, your unpushed revisions will get shoved to the front, and no merge commit is made.

Rebasing

If you only learn one thing beyond the git 101 stage it should probably be this. Never rebase a public branch! Now, I don’t like making hard and fast rules with exclamatory remarks like that, particularly because I think they contribute to the fear and trepidation that surrounds rebasing, and the reluctance to use git’s most powerful feature. Please don’t let that put you off. It really is the only thing you need to remember. Everything else is easy to fix =)

Linus Torvalds has said that all of git can be understood in terms of rebase. But I think there’s another command that helps illuminate even further: the cherry-pick.

This is what a cherry-pick looks like:

$ git cherry-pick <commit>

What it does is apply the changes introduced by a given commit anywhere else in your history to the tip of your current branch. You can tell it to apply it somewhere else if you want, but that’s what it does with no other args. If that sounds confusing, or if you’ve never really thought about git in those terms, go back and read that a couple of times.

cherry-pick is sort of the basic unit of a rebase. The difference is with rebase you’re saying: take this series of commits and replay them all, starting at another point in history.

This is what a rebase looks like:

# rebase against local master
$ git rebase master

# rebase against remote master
$ git fetch origin
$ git rebase origin/master

With interactive rebasing you have even more control over how to rewrite history. You can take commits out, shuffle them around, squash commits into other commits, stop the replay right in the middle and change something and continue where you left off. Powerful stuff.

This is what an interactive rebase looks like:

$ git rebase -i master

There are (at least) two distinct benefits that you get from rebasing. One is that you can introduce any upstream changes into your code, address any breakages or refactoring that can be done, then merge all your changes directly onto the tip of master, without a merge ‘bubble’, as if you had just written them in some kind of coding frenzy. The other is that you can commit however you want while you’re developing, and then go back and recompose your commit history into a string of coding pearls, squashing smaller changes, typos and errors, and writing beautiful commit messages with love and care.

One thing you might notice is that if you were pushing your topic branch before you rebased, when you try to push after the remote will refuse (and complain about it, too). This is normal and to be expected. It just means that you have to ‘force’ push your branch.

The reason for this is that you changed history by rebasing. Now, these words are often thrown around, but you might find that explanation to be a little vague. And rightfully so.

Here’s what’s really going on: when you rebase a branch onto another commit, you take that first commit you made when you first branched off and point it to a different commit. Doing so actually creates a new commit with a distinct SHA1 hash (what a commit points to is an essential part of the ‘content’ of a commit), and points HEAD to it. Your original commit is still there, it’s just not visible in your log any more because it’s not reachable from HEAD.

The next commit in your project branch is now pointing at this ‘ghost’ commit. It needs to be updated to point to its new parent. The process begins again. A new commit is created, HEAD is moved, and on and on. As the rebase replays all your changes, it effectively changes every commit hash in the branch. Your local branch and origin now have two different copies of the same changes but none of the hashes is the same. This is why git gives you the somewhat confusing indication to pull your changes down before trying to push. What you need to do instead is tell the remote to forget everything and just accept your local branch in place of whatever it has. And that looks like this:

$ git push -f origin <branch>

Some useful things to know

Reflog

For the longest time I held the reflog at arm’s length. I knew it existed and that it could be of help if you were in serious trouble. Maybe there was some security in thinking that if I managed never to use it then I could never have done anything that bad.

But I was wrong. The reflog is actually exciting, powerful and pretty straightforward.

$ git reflog
$ git reflog show <branch>

This will show you something that looks like this:

e58096a HEAD@{0}: commit: Really committed now.
5a4acd2 HEAD@{1}: commit: Commitment issues.
6f10f0e HEAD@{2}: commit: Committing some more.
146778b HEAD@{3}: commit: The awkward second commit.
8838e8d HEAD@{4}: commit: Initial commit.

It’s possible that some of the commits the reflog will show you will no longer be reachable on the graph (such as after a rebase). Want to undo a rebase? Just point HEAD to where it was before you started by using reset (more below).

Ranges

Ranges, which is to say the .. and ... syntax, can be pretty confusing because they can mean different things in different contexts. It’s important to know how to use them, though.

In the context of logs:

# git log
# commits that b has that a doesn't have
$ git log <commit a>..<commit b>
# commits in a and b but not both
$ git log <commit a>...<commit b>
# the last n commits
$ git log -<n>

In the context of diffs:

# git diff
# changes between commit a and commit b
$ git diff <commit a> <commit b>
# same
$ git diff <commit a>..<commit b>
# changes that occurred on a's branch since it branched off of b's
$ git diff <commit a>...<commit b>

in the context of checking out:

# git checkout
# checkout the merge base of a and b
$ git checkout <commit a>...<commit b>

Commit Parents

Sometimes it can be easier to refer to commits not by their SHA1 hash but by their relationship with another commit. This is especially so when dealing with recent history and your point of reference is HEAD. There are a number of different ways of saying the same thing, and you can combine them too:

# the current commit
$ HEAD
$ HEAD~0
# the 1st parent of the current commit
$ HEAD~
$ HEAD~1
# the 1st parent of the 1st parent of the current commit
$ HEAD~~
$ HEAD~2
$ HEAD~1~1
# the 2nd parent of the current commit
$ HEAD^2
# uh...
$ HEAD~2^2~5^2

Add

You already know how to do that. But have you tried adding in hunks? It looks like this:

# stage changes in hunks
$ git add -p

This allows you to add interactively. Git will try to present you with smaller ‘hunks’ of your code to stage one by one. If it’s not granular enough for you, you can just tell git to get more granular by splitting it. Here’s what it looks like:

Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]?

The most useful options to remember are y for yes, n for no, and s for split.

Bisect

This does a divide-and-conquer approach to locating a commit in your history that introduced some change (typically a regression). It requires only that can identify some point in your history that you know was good, and another point that is bad. Working with bisect will typically look like this:

# start it all off
$ git bisect start

# mark a known good commit
$ git bisect good <commit>

# mark a known bad commit
$ git bisect bad <commit>

# tell bisect the commit it checked out is good
$ git bisect good

# tell bisect the commit it checked out is bad
$ git bisect bad

You then repeat steps 4-5 until you’re down to one commit.

You can even automate the process:

# automate it
$ git bisect run rspec path/to/broken_spec.rb

Great stuff!

Blame

My FAVORITE tool. Mwahaha! In all seriousness though (ahem), this can be useful in situations where you have some code you really don’t understand despite your best efforts, and you need to have a chat with its author. Alternatively, you may want to credit someone for a revision that was really good. It looks like this:

$ git blame path/to/file

Revert

Creates a ‘mirror image’ of another commit that backs out the changes it introduced:

# create a new commit reversing the changes
$ git revert <commit>

You can even revert a merge commit by passing the -m flag and the parent that you want to keep. Typically this will just be 1, indicating master in situations where you merged a topic branch into it. The topic branch would be 2:

# revert a merge
$ git revert -m 1 <merge commit>

Reset

Something you may have used in desperation. Like rebase, reset is a powerful tool and it’s worth knowing what a few of the options do. Something all resets have in common is that they move HEAD to a new, specified commit. Unless you’re resetting to a point way back in history, it’s usually easier to provide a commit relative to HEAD. Here are a few options you want in your tool-belt:

# leave changes not in target in staging area
$ git reset --soft HEAD~
# leave changes not in target in working tree (default)
$ git reset --mixed HEAD~
# destroy all changes not included in target
$ git reset --hard HEAD~
# reset to previous point in the reflog
$ git reset --hard <branch>@{<reflog entry>}
# reset to where you were last week (!!!)
$ git reset --hard <branch>@{one.week.ago}

Conclusion

That’s more or less everything I know about being a git. There are some great resources, included below, that include more advanced topics if you’re interested in learning more. Being an intermediate git only really requires some curiosity and practice using the tools and techniques above. Once you get them, you’ll want to use most of them every day, and you’ll have internalized everything. And being an intermediate git won’t merely bring you up to scratch - it will actually set you apart from the rest (most of the time).