01 Jul 2017
Without much warning I recently decided to learn Colemak.
What?
Colemak is an alternative layout for keyboards. It aims to improve on
both the traditional QWERTY and the only slightly better-known Dvorak
by placing the commonest keys on the home row, along with certain
other considerations, to improve ergonomics and comfort while typing.
Why?
This came as a bit of a surprise to me as I have always felt somewhat
opposed to learning a new keyboard layout. This may have stemmed from
my own frustration in the past
in doubling on
Clarinet and Saxophone. While the two are keyed similarly, they
correspond to different “notes” as they are written down. Though it is
very common for people to do this, I really don’t enjoy the feeling of
disorientation at all.
The drawbacks I identified as:
- the initial effort of learning
- having to “double” when confronted with a QWERTY keyboard
- really, having to collaborate with anyone on anything ever again
The supposed benefits of faster typing speed and prevention of RSI I
never saw as a net gain. Which is not to say that I don’t care about
those things (I take injury prevention very seriously,
having
blogged about this before). It’s
just such an inexact science that I would welcome both of those
benefits if they came, but couldn’t reasonably expect them as
guaranteed.
But I think there was one other factor that has completely swung this
for me that has probably not been present at any other time that I’ve
been thinking about this. It is that I am incredibly bored. So bored
that I don’t want to learn anything exciting like a new programming
language, or even a new natural language, or how to ride a unicycle or
spin poi. I’ve been craving the dull repetition that I’ve felt as a
musician, a quiet confidence that if I do this dance with my hands
slowly and correctly enough times, I’ll program myself to perform a
new trick. I’ve been actually longing for the brain ache you get when
you’re trying to do something different and your muscle memory won’t
quit.
How?
There are many of these online, but I
found The Typing Cat particularly good in
getting started out. Not wanting to take the plunge straight away,
this let me emulate the new layout while I went through the exercises,
preserving QWERTY for everything else. For the first couple of weeks
I’d do QWERTY during the day and practice 1-2 hours of Colemak in the
evening, until I got up to an acceptable typing speed (for me, 30 wpm,
while still very slow, would not interfere too much).
Once I was ready to take the leap, I was confronted by a great number
of ways to do this, ranging from reconfiguring the keyboard at the
system level (useless, since X ignores it), configuring X from the
command line (annoying, because those changes aren’t preserved when I
make any customizations in the Gnome Tweak Tool), to discovering I
could do most of this by adjusting settings in the UI. I’ll describe
only what I eventually settled on in detail, in case you are trying to
do this yourself and are running a similar setup to me (Debian
9/Stretch, Gnome 3, US keyboard).
To set up Colemak, simply open Settings, go to Region & Language, hit
the +
under Input Sources, click English
then English (Colemak)
and you’re done. You should now see a new thing on the top right that
you can click on and select the input source you wish to use. You can
also rotate input sources by hitting Super (aka Windows key) and
Space.
Unfortunately I wasn’t done there because I had a few issues with some
of the design choices in the only variant of Colemak offered. Namely,
I didn’t want Colemak to reassign my Caps Lock key to Backspace (as I
was already reassigning it to Escape), and I wanted to use my right
Alt key as Meta, something I use all the time in Emacs and pretty much
everything that supports the basic Emacs keybindings (see: everything
worth using). While there may have been a way to customize this from
the command line, I never found out what that was, and besides I
wanted to find a solution that jelled as much as possible with the
general solution I’ve outlined above. It was with this spirit that I
decided to add my own, customized keyboard layout. If you’re having
similar grumbles, read on.
First, a word of caution. You’re going to have to edit some
configuration files that live in /usr/share
. If that makes you
queasy, I understand. I don’t especially love this solution, but I
think it is the best of all solutions known to me. Either way, as a
precautionary measure, I’d go ahead and backup the files we’re going
to touch:
sudo cp /usr/share/X11/xkb/symbols/us{,.backup}
sudo cp /usr/share/X11/xkb/rules/evdev.xml{,.backup}
Next we’re going to add a keyboard layout to the
/usr/share/X11/xkb/symbols/us
file. It’ll be an edited version of
the X.Org configuration which you can
find here. It can
probably go anywhere, but I inserted it immediately after the existing
entry for Colemak:
// /usr/share/X11/xkb/symbols/us
partial alphanumeric_keys
xkb_symbols "colemak-custom" {
include "us"
name[Group1]= "English (Colemak Custom)";
key <TLDE> { [ grave, asciitilde ] };
key <AE01> { [ 1, exclam ] };
key <AE02> { [ 2, at ] };
key <AE03> { [ 3, numbersign ] };
key <AE04> { [ 4, dollar ] };
key <AE05> { [ 5, percent ] };
key <AE06> { [ 6, asciicircum ] };
key <AE07> { [ 7, ampersand ] };
key <AE08> { [ 8, asterisk ] };
key <AE09> { [ 9, parenleft ] };
key <AE10> { [ 0, parenright ] };
key <AE11> { [ minus, underscore ] };
key <AE12> { [ equal, plus ] };
key <AD01> { [ q, Q ] };
key <AD02> { [ w, W ] };
key <AD03> { [ f, F ] };
key <AD04> { [ p, P ] };
key <AD05> { [ g, G ] };
key <AD06> { [ j, J ] };
key <AD07> { [ l, L ] };
key <AD08> { [ u, U ] };
key <AD09> { [ y, Y ] };
key <AD10> { [ semicolon, colon ] };
key <AD11> { [ bracketleft, braceleft ] };
key <AD12> { [ bracketright, braceright ] };
key <BKSL> { [ backslash, bar ] };
key <AC01> { [ a, A ] };
key <AC02> { [ r, R ] };
key <AC03> { [ s, S ] };
key <AC04> { [ t, T ] };
key <AC05> { [ d, D ] };
key <AC06> { [ h, H ] };
key <AC07> { [ n, N ] };
key <AC08> { [ e, E ] };
key <AC09> { [ i, I ] };
key <AC10> { [ o, O ] };
key <AC11> { [ apostrophe, quotedbl ] };
key <AB01> { [ z, Z ] };
key <AB02> { [ x, X ] };
key <AB03> { [ c, C ] };
key <AB04> { [ v, V ] };
key <AB05> { [ b, B ] };
key <AB06> { [ k, K ] };
key <AB07> { [ m, M ] };
key <AB08> { [ comma, less ] };
key <AB09> { [ period, greater ] };
key <AB10> { [ slash, question ] };
key <LSGT> { [ minus, underscore ] };
key <SPCE> { [ space, space ] };
};
Next you need to register it as a variant of the US keyboard layout:
<!-- /usr/share/X11/xkb/rules/evdev.xml -->
<xkbConfigRegistry version="1.1">
<!-- ... -->
<layoutList>
<layout>
<!-- ... -->
<configItem>
<name>us</name>
<!-- ... -->
</configItem>
<variantList>
<!-- Insert this stuff =-> -->
<variant>
<configItem>
<name>colemak-custom</name>
<description>English (Colemak Custom)</description>
</configItem>
</variant>
Finally, you’ll need to bust the xkb cache. I read about how to do
this
here,
but it didn’t seem to work for me (most likely differences between
Ubuntu and Debian, or different versions). So to prevent giving you
the same disappointment, I’m going to tell you the best way to get
this done that is sure to work: restart your damn computer. If you can
figure out a better way, that’s great.
Having done all the above, you should now be able to select your
Colemak (Custom)
layout in the same way by going through the
settings in the UI.
Since I’ve made the switch, I’ve seen my speed steadily increasing up
to 50-60 wpm. That’s still kind of slow for me, but I have every
confidence that it will continue to increase. I think doing drills has
helped with that. Since I have no need for emulation anymore, I’ve
found the CLI utility gtypist
to be particularly good. I try to do
the “Lesson C16/Frequent Words” exercises for Colemak every day.
20 Feb 2017
As someone who learned both to program and to test for the first time
with Rails, I was quickly exposed to a lot of opinions about testing
at once, with a lot of hand-waving. One of these was, as I remember
it, that Rails tests with fixtures by default, that fixtures are
problematic, that Factory Girl is a solution to those problems, so we
just use Factory Girl. I probably internalized this at the time as
“use Factory Girl to build objects in tests” without really
questioning why.
Some years later now, I sincerely regret not learning to use
fixtures first, to experience those pains for myself (or not), to find
out to what problem exactly Factory Girl was a solution. For, I’ve
come to discover, Factory Girl doesn’t prevent you from having some of
the same issues that you’d find with fixtures.
To understand this a bit better, let’s do a simple refactoring from
fixtures to factories to demonstrate what problems we are solving
along the way.
Consider the following:
# app/models/user.rb
class User < ApplicationRecord
validates :name, presence: true
validates :date_of_birth, presence: true
def adult?
date_of_birth + 21.years >= Date.today
end
end
# spec/fixtures/users.yml
Alice:
name: "Alice"
date_of_birth: <%= 21.years.ago %>
Bob:
name: "Bob"
date_of_birth: <%= 21.years.ago - 1.day %>
# spec/models/user_spec.rb
specify "a person of > 21 years is an adult" do
user = users(:Alice)
expect(user).to be_adult
end
specify "a person of < 21 years is not an adult" do
user = users(:Bob)
expect(user).not_to be_adult
end
Here we have two fixtures that contrast two different kinds of
user. If done well, your fixtures will be a set of objects that live
in the database that together weave a kind of narrative that is
revealed in tiny installments through your unit tests. Elsewhere in
our test suite, we’d continue with this knowledge that Alice is an
adult and Bob is a minor.
So what’s the problem? Well, one is what Meszaros calls the “mystery
guest”, a kind of “obscure test” smell. What that means is that the
main players in our tests - Alice and Bob, are defined far off in the
spec/fixtures/users.yml
file. Just looking at the test body, it’s
hard to know exactly what it was about Alice and Bob that made one an
adult, the other not. (Sure, we should know the rules about adulthood
in whatever country we’re in, but it’s easy to see how a slightly more
complicated example might not be so clear).
Let’s try to address that concern head on by removing the fixtures:
# spec/models/user_spec.rb
specify "a person of > 21 years is an adult" do
user = User.create!(name: "Alice", date_of_birth: 21.years.ago)
expect(user).to be_adult
end
specify "a person of < 21 years is not an adult" do
user = User.create!(name: "Bob", date_of_birth: 21.years.ago - 1.day)
expect(user).not_to be_adult
end
We’ve solved the mystery guest problem! Now we can see at a glance
what the relationship is between the attributes of each user and the
behavior exhibited by them.
Unfortunately, we have a new problem. Because a user requires a
:name
attribute, we have to specify a name in order to build a valid
user object in each test (we might in certain instances be able to get
away with using invalid objects, but it is probably not a good
idea). Here, the fact that we’ve had to give our users names has given
us another obscure test smell - we have introduced some noise in that
it’s not clear at a glance which attributes were relevant to the
behavior that’s getting exercised.
Another problem that might emerge is if we added a new attribute to
User
that was validated against - every test that builds a user
could fail for reasons that could be wholly unrelated to the behavior
they are trying to exercise.
Let’s try this again, extracting out a factory method:
# spec/models/user_spec.rb
specify "a person of > 21 years is an adult" do
user = create_user(date_of_birth: 21.years.ago)
expect(user).to be_adult
end
specify "a person of < 21 years is not an adult" do
user = create_user(date_of_birth: 21.years.ago - 1.day)
expect(user).not_to be_adult
end
def create_user(attributes = {})
User.create!({name: "Alice", date_of_birth: 30.years.ago}.merge(attributes))
end
Problem solved! We have some sensible defaults in the factory method,
meaning that we don’t have to specify attributes that are not relevant
in every test, and we’ve overridden the one that we’re testing -
date_of_birth
- in those tests on adulthood. If new validations are
added, we have one place to update to make our tests pass again.
I’m going to pause here for some reflection before we complete our
refactoring. There is another thing that I regret about the way I
learned to test. And it is simply not using my own factory methods as
I have above, before finding out what problem Factory Girl was trying
to address with doing that. Nothing about the code above strikes me
yet as needing a custom DSL, or a gem to extract. Ruby already does a
great job of making this stuff easy.
Sure, the above is a deliberately simple and contrived example. If we
find ourselves doing more complicated logic inside a factory method,
maybe a well-maintained and feature-rich gem such as Factory Girl can
help us there. Let’s assume that we’ve reached that point and plough
on so we can complete the refactoring.
# spec/factories/user.rb
FactoryGirl.define do
factory :user do
name "Alice"
date_of_birth 30.years.ago
end
end
# spec/models/user_spec.rb
specify "a person of > 21 years is an adult" do
user = create(:user, date_of_birth: 21.years.ago)
expect(user).to be_adult
end
specify "a person of < 21 years is not an adult" do
user = create(:user, date_of_birth: 21.years.ago - 1.day)
expect(user).not_to be_adult
end
This is fine. Our tests look pretty much the same as before, but
instead of a factory method we have a Factory Girl factory. We haven’t
solved any immediate problems in this last step, but if our User
model gets more complicated to set up, Factory Girl will be there with
lots more features for handling just about anything we might want to
throw at it.
It seems clear to me now that the problem that Factory Girl solved
wasn’t anything to do with fixtures, since it’s straightforward to
create your own factory methods. It was presumably the problem of
having cumbersome factory methods that you had to write yourself.
However. This is not quite the end of the story for some folks, and
that there’s a further refactoring we can seize upon:
# spec/factories/user.rb
FactoryGirl.define do
factory :user do
name "Alice"
date_of_birth 30.years.ago
trait :adult do
date_of_birth 21.years.ago
end
trait :minor do
date_of_birth 21.years.ago - 1.day
end
end
end
# spec/models/user_spec.rb
specify "a person of > 21 years is an adult" do
user = create(:user, :adult)
expect(user).to be_adult
end
specify "a person of < 21 years is not an adult" do
user = create(:user, :minor)
expect(user).not_to be_adult
end
Here, we’ve used Factory Girl’s traits API to define what it means to
be both an adult and a minor in the factory itself, so if we ever have
to use that concept again the knowledge for how to do that is
contained in one place. Well done to us!
But hang on. Haven’t we just reintroduced the mystery guest smell that
we were trying so hard to get away from? You might observe that these
tests look fundamentally the same as the ones that we started out
with.
Used in this way, factories are just a different kind of shared
fixture. We have the same drawback of having test obscurity, and we’ve
taken the penalty of slower tests because these objects have to be
built afresh for every single example. What was the point?
Okay, okay. Traits are more of an advanced feature in Factory
Girl. They might be useful, but they don’t solve any problems that we
have at this point. How about we just keep things simple:
# spec/factories/user.rb
FactoryGirl.define do
factory :user do
name "Alice"
date_of_birth 30.years.ago
end
end
# spec/models/user_spec.rb
it "tests adulthood" do
user = create(:user)
expect(user).to be_adult
end
This example is actually worse, and is quite a popular
anti-pattern. An obvious problem is that if I needed to change one of
the factory default values, tests are going to break, which should
never happen. The goal of factories is to build an object that passes
validation with the minimum number of required attributes, so you
don’t have to keep specifying every required attribute in every single
test you write. But if you’re depending on the specific value of any
of those attributes set in the factory in your test, you’re Doing It
Wrong ™️.
You’ll also notice that the test provides little value in not testing
around the edges (in this case dates of birth around 21 years
ago).
Let’s compare with our earlier example (the one before things started
to go wrong):
# spec/factories/user.rb
FactoryGirl.define do
factory :user do
name "Alice"
date_of_birth 30.years.ago
end
end
# spec/models/user_spec.rb
specify "a person of > 21 years is an adult" do
user = create(:user, date_of_birth: 21.years.ago)
expect(user).to be_adult
end
specify "a person of < 21 years is not an adult" do
user = create(:user, date_of_birth: 21.years.ago - 1.day)
expect(user).not_to be_adult
end
Crucially we don’t use the default date_of_birth
value in any of our
tests that exercise it. This means that if I changed the default value
to literally anything else that still resulted in a valid user object,
my tests would still pass. By using specific values for
date_of_birth
around the edge of adulthood, I know that I have
better tests. And by providing those values in the test body, I can
see the direct relationship between those values and the behavior
exercised.
Like a lot of sharp tools in Ruby, Factory Girl is rich with features
that are very powerful and expressive. But in my opinion, its more
advanced features are prone to overuse. It’s also easy to confuse
Factory Girl for a library for creating shared fixtures - Rails
already comes with one, and it’s better at doing that. Neither of
these are faults of Factory Girl, rather I believe they are faults in
the way we teach testing.
So don’t use Factory Girl to create shared fixtures - if that’s the
style you like then you may want to consider going back to Rails’
fixtures instead.
01 Aug 2016
Testing JSON structures with arbitarily deep nesting can be
hard. Fortunately RSpec comes with some lesser-known composable
matchers that not only make for some very readable expectations but
can be built up quite arbitrarily too, mirroring the structure of your
JSON. They can provide you with a single expectation on your response
body that is diffable and will give you a pretty decent report on what
failed.
While I don’t necessarily recommend you test every aspect of your API
through full-stack request specs, you are probably going to have to
write a few of them, and they can be painful to write. Fortunately
RSpec offers a few ways to make your life easier.
First, though, I’d like to touch on a couple of other things I do when
writing request specs to get the best possible experience when working
with these slow, highly integrated tests.
Order of expectations
Because request specs are expensive, you’ll often want to combine a
few expectations into a single example if they are essentially testing
the same behavior. You’ll commonly see expectations on the response
body, headers and status within a single test. If you do this,
however, it’s important to bear in mind that the first expectation to
fail will short circuit the others by default. So you’ll want to put
the expectations that provide the best feedback on what went wrong
first. I’ve found the expectation on the status to be least useful, so
always put this last. I’m usually most interested in the response
body, so I’ll put that first.
Using failure aggregation
One way to get around the expectation order problem is to use failure
aggregation, a feature first introduced in RSpec 3.3. Examples that
are configured to aggregate failures will execute all the expectations
and report on all the failures so you aren’t stuck with just the
rather opaque “expected 200, got 500”. You can enable this in a few
ways, including in the example itself:
it "will report on both these expectations should they fail", aggregate_failures: true do
expect(response.parsed_body).to eq("foo" => "bar")
expect(response).to have_http_status(:ok)
end
Or in your RSpec configuration. Here’s how to enable it for all your
API specs:
# spec/rails_helper.rb
RSpec.configure do |c|
c.define_derived_metadata(:file_path => %r{spec/api}) do |meta|
meta[:aggregate_failures] = true
end
end
Using response.parsed_body
Since I’ve been testing APIs I’ve always written my own JSON parsing
helper. But in version 5.0.0.beta3 Rails added a method to the
response object to do this for you. You’ll see me using
response.parsed_body
throughout the examples below.
Using RSpec composable matchers to test nested structures
I’ve outlined a few common scenarios below, indicating which matchers
to use when they come up.
Use eq
when you want to verify everything
expected = {
"data" => [
{
"type" => "posts",
"id" => "1",
"attributes" => {
"title" => "Post the first"
},
"links" => {
"self" => "http://example.com/posts/1"
}
}
]
"links" => {
"self" => "http://example.com/posts",
"next" => "http://example.com/posts?page[offset]=2",
"last" => "http://example.com/posts?page[offset]=10"
}
"included" => [
{
"type" => "comments",
"id" => "1",
"attributes" => {
"body" => "Comment the first"
},
"relationships" => {
"author" => {
"data" => { "type" => "people", "id" => "2" }
}
},
"links" => {
"self" => "http://example.com/comments/1"
}
}
]
}
expect(response.parsed_body).to eq(expected)
Not a composable matcher, but shown here to contrast with the examples
that follow. I typically don’t want to use this - it can make for some
painfully long-winded tests. If I wanted to check every aspect of the
serialization, I’d probably want to write a unit test on the
serializer anyway. Most of the time I just want to check that a few
things are there in the response body.
Use match
when you want to be more flexible
expected = {
"data" => kind_of(Array),
"links" => kind_of(Hash),
"included" => anything
}
expect(response.parsed_body).to match(expected)
match
is a bit fuzzier than eq
, but not as fuzzy as include
(below). match
verifies that the expected values are not only
correct but also that they are sufficient - any superfluous attributes
will fail the above example.
Note that match
allows us to start composing expectations out of
other matchers such as kind_of
and anything
(see below), something
we couldn’t do with eq
.
Use include
/a_hash_including
when you want to verify certain key/value pairs, but not all
expected = {
"data" => [
a_hash_including(
"attributes" => a_hash_including(
"title" => "Post the first"
)
)
]
}
expect(response.parsed_body).to include(expected)
include
is similar to match
but doesn’t care about superfluous
attributes. As we’ll see, it’s incredibly flexible and is my go-to
matcher for testing JSON APIs.
a_hash_including
is just an alias for include
added for
readability. It will probably make most sense to use include
at the
top level, and a_hash_including
for things inside it, as above.
Use include
/a_hash_including
when you want to verify certain keys are present
expect(response.parsed_body).to include("links", "data", "included")
The include
matcher will happily take a list of keys instead of
key/value pairs.
Use a hash literal when you want to verify everything at that level
expected = {
"data" => [
{
"type" => "posts",
"id" => "1",
"attributes" => {
"title" => "Post the first"
},
"links" => {
"self" => "http://example.com/posts/1"
}
}
]
}
expect(response.parsed_body).to include(expected)
Here we only care about the root node "data"
since we are using the
include
matcher, but want to verify everything explicitly under it.
Use a_collection_containing_exactly
when you have an array, but can’t determine the order of elements
expected = {
"data" => a_collection_containing_exactly(
a_hash_including("id" => "1"),
a_hash_including("id" => "2")
)
}
expect(response.parsed_body).to include(expected)
Use a_collection_including
when you have an array, but don’t care about all the elements
expected = {
"data" => a_collection_including(
a_hash_including("id" => "1"),
a_hash_including("id" => "2")
)
}
expect(response.parsed_body).to include(expected)
Guess what? a_collection_including
is just another alias for the
incredibly flexible include
, but can be used to indicate an array
for expressiveness.
Use an array literal when you care about the order of elements
expected = {
"data" => [
a_hash_including("id" => "1"),
a_hash_including("id" => "2")
]
}
expect(response.parsed_body).to include(expected)
expected = {
"data" => all(a_hash_including("type" => "posts"))
}
expect(response.parsed_body).to include(expected)
Here we don’t have to say how many elements "data"
contains, but we
do want to make sure they all have some things in common.
Use anything
when you don’t care about some of the values, but do care about the keys
expected = {
"data" => [
{
"type" => "posts",
"id" => "1",
"attributes" => {
"title" => "Post the first"
},
"links" => {
"self" => "http://example.com/posts/1"
}
}
]
"links" => anything,
"included" => anything
}
expect(response.parsed_body).to match(expected)
Use a_string_matching
when you want to verify part of a string value, but don’t care about the rest
expected = {
"links" => a_hash_including(
"self" => a_string_matching(%r{/posts})
)
}
expect(response.parsed_body).to include(expected)
Yep, another alias for include
.
Use kind_of
if you care about the type, but not the content
expected = {
"data" => [
a_hash_including(
"id" => kind_of(String)
)
]
}
expect(response.parsed_body).to include(expected)
That’s about it! Composable matchers are one of my favorite things
about RSpec. I hope you will love them too!
20 Jun 2016
For the uninitiated, The Moomins is a series of books and a comic
strip by the wonderful Tove Jansson. These Moomins live in the
fictional and idyllic Moominvalley set somewhere in the forests of
Finland. It is a complex landscape rich in imagery, symbolism,
archetypes, and their world has been reimagined many times since
Jansson first wrote about it. One such of these was Moomin, a show
from the 90s that fused the best of this Finnish folklore with zaney
Japanese animation. And it is my favorite from childhood.
So enamored was I with this show that I continue to watch it
unironically to this day, and not just for the feeling of
nostalgia. Though it is full of action and occasionally disturbing
(the groke!), I nonetheless find it really calming to lose myself in
the otherwise zen-like serenity of Moominvalley for 20 minutes or so.
Adventure is of course central to every episode, and sure enough the
Moomins meet lots of interesting and occasionally magical creatures, and
one of these is The Hobgoblin.
I didn’t remember much about the Hobgoblin from childhood, but I was
struck watching it more recently with the following:
- He is a powerful magician.
- He collects Rubies.
- He is in search of the King’s Ruby.
- He rides a puma through the sky.
I am so surprised the Ruby community has not picked up on this yet!
24 Apr 2015
Git 101
My early professional career required that I knew how to do six things
in git: branch, stage, commit, merge, push and pull. Beyond that there
was always google. And of course that stack overflow page that
everyone stumbles on eventually: if I effed something up there was
git reset --hard HEAD
, and if I really effed it up I could do git
reset --hard HEAD~
. Or was it the other way round?
To my surprise now, I got a lot of leverage out of just those six (or
seven) commands. But that was probably because no-one else really
minded what I was doing. We committed to master and dealt with
problems as they came up. No-one read the history. We pushed to a
gitolite server, which, as great as that is, is so far away from the
world of GitHub that to any novice it was something of a black
box. Code got committed and pushed. Who knows what happened after
that? If something broke, it meant doing more committing and
pushing.
Fortunately for me this didn’t last for too long. I decided at some
point that I needed to understand git a little better.
Now, I still don’t consider myself an expert in any way. I did give a
talk on the subject at work recently which I enjoyed, and wanted to
summarize more formally the contents of that here. So here it
is. Something like the guide I wish I had read a couple years ago to
get me through the git 101 blues. It will cover:
- Some standard and some not-so-standard terms
- How to write a better commit message (that old chestnut)
- How to make better commits
- Some ways to configure git to make your life easier
- What the hell rebasing is
- A few odd parts of git’s syntax
- Some lesser-used tools that you might like
Terms!
First of all, let’s define a few terms. I won’t define every term,
just a few that are either vague or that I will use frequently
throughout.
- Private branch
-
A branch that is used by just you. Pushing it to a remote does not
necessarily make it public.
- Public branch
- A branch that is shared (read: committed to) by many.
- HEAD
-
I always wondered if you were supposed to scream this. I might less
formally refer to it as simply as the 'head' or 'tip'. It is simply
the current revision of a given branch.
- The graph
-
A lot can be said about the graph, and it's probably beyond the scope
of this article to talk about this in any detail. Let's just say that
a requirement for understanding git's internals is some rudimentary
knowledge about graph theory. I really do mean rudimentary, so don't
let that put you off. There is a great resource on explaining git in
terms of graph theory here,
which I would highly recommend.
In terms of graph theory, your git history is essentially a graph
composed of commit 'nodes'. The commits at the HEAD of branches are
your 'leaf' nodes. Your current revision in this sense refers to the
series of changes (i.e. Commit nodes) that are 'reachable'
(i.e. Pointed to by HEAD, or pointed to by commits that are pointed to
by HEAD, and on and on).
- Merge bubble
- When you merge two branches, you will get a merge 'bubble' by creating
a new commit in the target branch that retains the integrity of both
branches. This is a special 'merge commit', and it's special because
it points to two different commits in the history - the tip of the
target (typically `master`) branch, and the tip of the topic
branch. You wouldn't create this commit by hand, it will happen
automatically depending on how you've set up your
`.gitconfig`. Typically, if you're working on a team and you haven't
configured git at all, or if you're using the github web interface to
merge branches, you will end up with lots of merge bubbles.
- Fast-forward
- This is what happens when you merge without creating a merge
bubble. Git will merge your changes in at the top of your target
branch as if you had just been committing to it all along. No merge
commit is created.
- Squashing
-
This is a technique used for combining commits that have already been
made into bigger, more consolidated ones.
Some committing anti-patterns
Here are some things that can generally go wrong:
- Putting everything into one big commit
- Writing an incomplete commit message
- Breaking something. Committing. Fixing it later.
- (More advanced) rebasing or committing in hunks without checking the
state of each commit
One thing I learned early on was that it is a good idea to commit
frequently. Unfortunately that’s not the whole story. Although it does
address anti-pattern #1, it will often mean trading it for #2 or
#3. Practicing TDD is actually conducive to making frequent, small
commits because you’re concentrating on either getting to green (a
requirement for a good commit) without getting distracted or writing
more code than is needed, or refactoring in small steps. Essentially,
it’s OK to do #2 or #3 as long as you’re working in a private branch
and you squash or rewrite your commits before merging by performing an
interactive rebase (more on this later).
Squashing everything isn’t necessarily a good idea either. The goal
should be to be left with a small number of commits that each mark a
distinct progression toward some goal (adding a new feature,
refactoring, etc.). As you become more savvy with rebasing
interactively you may fall prey to antipattern #4. In other words,
when you’re rewriting history it’s important to check the integrity of
each commit that you’re creating after the fact. If you really care
about your history, and not just your HEAD, you’ll want every commit
to be green and deployable.
There are actually a few reasons why you might want to take such care
of your history. The first that comes to mind is being able to use
git’s bisect
feature with more confidence. bisect
is a tool used
for examing a portion of your history, typically for locating a commit
that introduced some regression. It is a very powerful and useful tool
that I’ve personally seen rendered completely useless by careless
committing. More on bisect
later.
Another reason might be being able to generate metrics for your
application across a range of commits.
Another is simply being able to read your history with relative
ease. This is more a comment on composing good commits with good
commit messages. (Occasionally, for inspiration, I’ll go spelunking
through the history of some open source software that I love, go right
back to the first commit and rediscover the steps of creating its
first complete feature.)
There are two rules I like to follow when composing a commit
message. The first is to use the present tense imperative in the first
line. The reason for this is that this is the tense/mood used in git’s
generated messages such as on merge commits. A nice side effect of
this is that you will probably find that your messages are shorter and
succinter. The second rule is never to use the -m
flag. Trying to
fit your entire message onto the first line is just way too much
pressure! How formal you want to get with your message after that is
up to you. Generally it’s a good idea to have a short, descriptive
first line, followed by a longer description and a link to an issue
number or ticket if one exists. I add thoughtbot’s template to help
remind me:
# ~/.gitconfig
[commit]
template = ~/.gitmessage
# ~/.gitmessage
# 50-character subject line
#
# 72-character wrapped longer description. This should answer:
#
# * Why was this change necessary?
# * How does it address the problem?
# * Are there any side effects?
#
# Include a link to the ticket, if any.
More on your gitconfig
There are a couple more things that you may want to consider adding or
tweaking in your gitconfig. Often you’ll see official advice telling
you to use the git command line interface to accomplish this, but I
prefer to edit my ~/.gitconfig
by hand.
Here are a few things I recommend playing with:
[alias]
a = add
br = branch
ci = commit
co = checkout
st = status
These are a few simple and common aliases that have become more or
less standard (see that kernel wiki article for others). I won’t
enumerate all the ones I use here, but feel free to check out my
dotfiles. Aliasing is essential to being productive if you’re
interacting with git at the command line. Feel free to create aliases
in your ~/.bashrc
too. Alias git
to g
, and more common commands
such as git status
to gs
. It might seem trivial at first, but if
you type git status
about 200 times a day as do I, you are going to
be saving quite a few keystrokes by the end of the week. And that’s
time you could be spending thinking about your design, or even going
for a walk in the park.
This is useful if you don’t want git to create a merge bubble unless
specifically asked to do so. If your branch can’t be fast-forwarded,
it won’t be merged either until you rebase, or you pass a flag
overriding the above.
[branch]
autosetuprebase = always
Useful if you are using a rebase-style workflow (more below). With
this set, if you pull from an upstream on a branch where you have
revisions that have not yet been pushed, your unpushed revisions will
get shoved to the front, and no merge commit is made.
Rebasing
If you only learn one thing beyond the git 101 stage it should
probably be this. Never rebase a public branch! Now, I don’t like
making hard and fast rules with exclamatory remarks like that,
particularly because I think they contribute to the fear and
trepidation that surrounds rebasing, and the reluctance to use git’s
most powerful feature. Please don’t let that put you off. It really is
the only thing you need to remember. Everything else is easy to fix =)
Linus Torvalds has said that all of git can be understood in terms of
rebase
. But I think there’s another command that helps illuminate even
further: the cherry-pick
.
This is what a cherry-pick looks like:
$ git cherry-pick <commit>
What it does is apply the changes introduced by a given commit
anywhere else in your history to the tip of your current branch. You
can tell it to apply it somewhere else if you want, but that’s what it
does with no other args. If that sounds confusing, or if you’ve never
really thought about git in those terms, go back and read that a
couple of times.
cherry-pick
is sort of the basic unit of a rebase
. The difference
is with rebase
you’re saying: take this series of commits and
replay them all, starting at another point in history.
This is what a rebase looks like:
# rebase against local master
$ git rebase master
# rebase against remote master
$ git fetch origin
$ git rebase origin/master
With interactive rebasing you have even more control over how to
rewrite history. You can take commits out, shuffle them around, squash
commits into other commits, stop the replay right in the middle and
change something and continue where you left off. Powerful stuff.
This is what an interactive rebase looks like:
There are (at least) two distinct benefits that you get from
rebasing. One is that you can introduce any upstream changes into your
code, address any breakages or refactoring that can be done, then
merge all your changes directly onto the tip of master, without a
merge ‘bubble’, as if you had just written them in some kind of coding
frenzy. The other is that you can commit however you want while you’re
developing, and then go back and recompose your commit history into a
string of coding pearls, squashing smaller changes, typos and errors,
and writing beautiful commit messages with love and care.
One thing you might notice is that if you were pushing your topic
branch before you rebased, when you try to push after the remote will
refuse (and complain about it, too). This is normal and to be
expected. It just means that you have to ‘force’ push your branch.
The reason for this is that you changed history by rebasing. Now,
these words are often thrown around, but you might find that
explanation to be a little vague. And rightfully so.
Here’s what’s really going on: when you rebase a branch onto another
commit, you take that first commit you made when you first branched
off and point it to a different commit. Doing so actually creates a
new commit with a distinct SHA1 hash (what a commit points to is an
essential part of the ‘content’ of a commit), and points HEAD to
it. Your original commit is still there, it’s just not visible in your
log any more because it’s not reachable from HEAD.
The next commit in your project branch is now pointing at this ‘ghost’
commit. It needs to be updated to point to its new parent. The process
begins again. A new commit is created, HEAD is moved, and on and
on. As the rebase replays all your changes, it effectively changes
every commit hash in the branch. Your local branch and origin now have
two different copies of the same changes but none of the hashes is the
same. This is why git gives you the somewhat confusing indication to
pull your changes down before trying to push. What you need to do
instead is tell the remote to forget everything and just accept your
local branch in place of whatever it has. And that looks like this:
$ git push -f origin <branch>
Some useful things to know
Reflog
For the longest time I held the reflog at arm’s length. I knew it
existed and that it could be of help if you were in serious
trouble. Maybe there was some security in thinking that if I managed
never to use it then I could never have done anything that bad.
But I was wrong. The reflog is actually exciting, powerful and pretty
straightforward.
$ git reflog
$ git reflog show <branch>
This will show you something that looks like this:
e58096a HEAD@{0}: commit: Really committed now.
5a4acd2 HEAD@{1}: commit: Commitment issues.
6f10f0e HEAD@{2}: commit: Committing some more.
146778b HEAD@{3}: commit: The awkward second commit.
8838e8d HEAD@{4}: commit: Initial commit.
It’s possible that some of the commits the reflog will show you will
no longer be reachable on the graph (such as after a rebase). Want to
undo a rebase? Just point HEAD to where it was before you started by
using reset
(more below).
Ranges
Ranges, which is to say the ..
and ...
syntax, can be pretty
confusing because they can mean different things in different
contexts. It’s important to know how to use them, though.
In the context of logs:
# git log
# commits that b has that a doesn't have
$ git log <commit a>..<commit b>
# commits in a and b but not both
$ git log <commit a>...<commit b>
# the last n commits
$ git log -<n>
In the context of diffs:
# git diff
# changes between commit a and commit b
$ git diff <commit a> <commit b>
# same
$ git diff <commit a>..<commit b>
# changes that occurred on a's branch since it branched off of b's
$ git diff <commit a>...<commit b>
in the context of checking out:
# git checkout
# checkout the merge base of a and b
$ git checkout <commit a>...<commit b>
Commit Parents
Sometimes it can be easier to refer to commits not by their SHA1 hash
but by their relationship with another commit. This is especially so
when dealing with recent history and your point of reference is
HEAD. There are a number of different ways of saying the same thing,
and you can combine them too:
# the current commit
$ HEAD
$ HEAD~0
# the 1st parent of the current commit
$ HEAD~
$ HEAD~1
# the 1st parent of the 1st parent of the current commit
$ HEAD~~
$ HEAD~2
$ HEAD~1~1
# the 2nd parent of the current commit
$ HEAD^2
Add
You already know how to do that. But have you tried adding in hunks?
It looks like this:
# stage changes in hunks
$ git add -p
This allows you to add interactively. Git will try to present you with
smaller ‘hunks’ of your code to stage one by one. If it’s not granular
enough for you, you can just tell git to get more granular by
splitting it. Here’s what it looks like:
Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]?
The most useful options to remember are y
for yes, n
for no, and
s
for split.
Bisect
This does a divide-and-conquer approach to locating a commit in your
history that introduced some change (typically a regression). It
requires only that can identify some point in your history that you
know was good, and another point that is bad. Working with bisect will
typically look like this:
# start it all off
$ git bisect start
# mark a known good commit
$ git bisect good <commit>
# mark a known bad commit
$ git bisect bad <commit>
# tell bisect the commit it checked out is good
$ git bisect good
# tell bisect the commit it checked out is bad
$ git bisect bad
You then repeat steps 4-5 until you’re down to one commit.
You can even automate the process:
# automate it
$ git bisect run rspec path/to/broken_spec.rb
Great stuff!
Blame
My FAVORITE tool. Mwahaha! In all seriousness though (ahem), this can
be useful in situations where you have some code you really don’t
understand despite your best efforts, and you need to have a chat with
its author. Alternatively, you may want to credit someone for a
revision that was really good. It looks like this:
Revert
Creates a ‘mirror image’ of another commit that backs out the changes
it introduced:
# create a new commit reversing the changes
$ git revert <commit>
You can even revert a merge commit by passing the -m
flag and the
parent that you want to keep. Typically this will just be 1
,
indicating master
in situations where you merged a topic branch into
it. The topic branch would be 2
:
# revert a merge
$ git revert -m 1 <merge commit>
Reset
Something you may have used in desperation. Like rebase
, reset
is
a powerful tool and it’s worth knowing what a few of the options
do. Something all resets have in common is that they move HEAD to a
new, specified commit. Unless you’re resetting to a point way back in
history, it’s usually easier to provide a commit relative to HEAD.
Here are a few options you want in your tool-belt:
# leave changes not in target in staging area
$ git reset --soft HEAD~
# leave changes not in target in working tree (default)
$ git reset --mixed HEAD~
# destroy all changes not included in target
$ git reset --hard HEAD~
# reset to previous point in the reflog
$ git reset --hard <branch>@{<reflog entry>}
# reset to where you were last week (!!!)
$ git reset --hard <branch>@{one.week.ago}
Conclusion
That’s more or less everything I know about being a git. There are
some great resources, included below, that include more advanced
topics if you’re interested in learning more. Being an intermediate
git only really requires some curiosity and practice using the tools
and techniques above. Once you get them, you’ll want to use most of
them every day, and you’ll have internalized everything. And being an
intermediate git won’t merely bring you up to scratch - it will
actually set you apart from the rest (most of the time).