<% Garbled

garbled = Blog.new(:author => 'Ben Hamill')

My Git Talk At Austin On Rails

At the last Austin on Rails meeting (Nov 17), I gave a talked entitled Practical Git Quickstart (Prezi link). The slides don’t have a lot of content and mostly underscored what I hoped to talk about. I blew through them in about ten minutes or less. The short of it is that I feel like a lot of git tutorials and introductions start off with the high-level stuff and that, especially for people new to git, that that’s confusing. My goal was to give git newbies the most basic commands they’d need to be able to use git on a daily basis so that they could build their own abstractions before diving into the more heady stuff. I was aiming for an 80% solution to that, anyway.

After I finished the slides, I fired up a command line and an editor and just worked through some stuff. This post should sum up what I talked about, more or less. I started out covering the same stuff I covered in my previous git tutorial post, so maybe go check that out first. It should get you through setting up a new repository, adding files to the staging area, making a commit, checking your status and committing to a remote repository.

So let’s pick up there, with remote repositories. The way you get code up to your repo is with git push origin master. Once it’s up there, other people can get at it. If you recall, you told git where your remote repo was going to be with git remote add origin git@github.com:<username>/<project>.git. Someone who wants their own local copy of your repo does so with the clone command like so: git clone git@github.com:<username>/<project>.git. That will create a directory wherever the command is issued, named <project> and pull down the current state of the remote repo. Then, that person will be able to push their own changes, etc. This is all, of course, assuming they’ve got permission to do so.

So this new second person makes some changes and pushes them on up. How do you get them? Well, sensibly, the opposite of push is pull, so you issue git pull origin master. This is actually a two step process that’s just for convenience. I don’t want to get into the plumbing too much, but it basically grabs the state of the remote repo (git fetch) and then attempts to merge (git merge) it with your local stuff. So that’s the most basic case of working with someone else on a project, or working alone on one using different machines, if you like. I use that case all the time.

So what about conflicts? If you both make a change to the same file and they push it first, you’ll not be allowed to push because git can’t handle the merge on it’s own. Similarly, if you try to pull, it will do the fetch part, but be unable to merge and will tell you so. You can use git diff to see what the changed were and do the merge yourself. You can also use git difftool which is awesome, but takes some setup, so you should look into it later on (I skipped it in my presentation).

Once you handle the conflicts, you’ll add the conflicting files to the staging area and make a commit. With all merges, I should note, git makes a commit just for the merge, so when you have conflicts, it’ll have staged the things it can merge on its own and left the conflicts unstaged. As you fix them, you stage them and then you commit the merge commit. Git doesn’t know if you really fixed the conflicts, so you can gid add whatever version of the file you want, even a broken, not-conflict-resolved one. Just be aware.

That was more or less the end of my ordered presentation. There were some questions afterward and I’m going to attempt to sum up the discussion that followed, here:

First off, I wanted to mention how you ignore files in git. Unlike subversion, there is no git ignore. If you want git to ignore a file, you have to add it to a .gitignore file. This file is a list of patterns that git will ignore for the directory it’s in and all directories below it. So you might have one for a python project like this:

1
2
tmp/*
*.pyc

This will ignore all compiled python code (*.pyc) and everything in your tmp/ directory. I was baffled by this when I first came to git, but it’s not really that hard. Note that you generally commit your .gitignore so that others can share it. If there’s something you want to ignore on a per-machine basis, rather than a per-project basis, then you need to turn to my next topic.

Which is global git preferences. On Linux and Mac, git will look for a file in your root directory called .gitconfig and take global behaviors from it (it’s tricky on Windows, and I haven’t figured it out to my own satisfaction, sorry. If someone asks about it, I’ll try to sum up what I know in the comments). In my other git post, I had gone through setting up a repo on GitHub and said to follow the directions there. Two of those steps were these:

git config --global user.name "<your name>"
git config --global user.email <your_email>

Those created entries in your ~/.gitconfig telling git your name and email address. You can also declare a global ignore file there. I like to call mine .gitignore. This is shockingly original, I know. On the machine I’m typing on right now, my ~/.gitconfig looks like this:

1
2
3
4
5
[user]
  email = <blah blah blah>
  name = Ben Hamill
[core]
  excludesfile = /home/ben/.gitignore

I bet you can guess it, but just in case, you can either put your excludesfile in manually or do git config --global core.excludesfile /whatever/file/path/you/want. For reference, my ~/.gitignore looks like this:

1
2
*.kpf
*.swp

A .kpf file is a project file created by Komodo Edit, which I used to use for all my code editing needs, but not since I switched to vim, which is what creates *.swp files.

Finally, someone had asked about git stash. It’s what I’d consider a more advanced command, but a lot of git fanboys sell it hard because it’s cool and svn doesn’t have it. However, as cool as it is, I think it can get you into a lot of trouble. Basically, you can be working on something and issue git stash and git will store whatever changes you’re in the middle of and hide them away, putting your repo back in the state it was right after the last commit. You can then work on something more pressing, make commits, merges, new branches, whatever and when you’re done, issue git stash pop and it applies your changes back (if it can).

The really hairy bit is that you can name stashes and so have more than one stash going at once. While a super organized developer might find this really useful, I find that it’s easy to get stuff lost in there. You don’t want to have tons and tons of stuff stashed and not remember, anymore, what changes were in which stash, etc. I advise, as a basic rule of thumb, that if you’ve already got one thing stashed and find yourself wanting to stash something else, then you should be looking at branching, not stashing.

I think that about covers it. I think someone recorded audio of my talk or maybe video. If it ends up posted somewhere, I’ll come edit this post with a link to it. If you were at my talk and notice something I talked about then that I haven’t covered here, let me know and I’ll try to amend. Or, if you weren’t there and feel there’s a topic you have questions about, drop it in the comments and I’ll do what I can.

Git Tutorials Suck, A Sucky Git Tutorial

Context… Perhaps Too Much Of It

So I was reading this blog post about learning and explaining because @carl_youngblood tweeted about it. I think Carl’s right: I had a hard time learning git (by which I don’t mean to imply I’m some sort of expert now, but the learning is going easier now).

I think the main problem that I had was this: Having learned Subversion, with it’s central repository, it was a hard abstract thing to understand. And some (I feel many of the ones I read, anyway) of the tutorials out there try to start at the abstract. Little help that did me (see above-linked article. Really, it’s very good). And even ignoring those, I had to read a lot lot lot of the practical ones before things started sinking in.

So I’ve sort of come to understand that, actually, the tutorials don’t suck; learning abstract things just takes time and, at the time, that can be frustrating. So I’m going to offer my own little sucky tutorial, which will focus on the practical aspects and, if you read this and don’t get it, you can follow some links at the end to other articles I found helpful and maybe, after roughly a week, you’ll have your ‘ah-Ha!’ moment and think about how git is just like monads… whatever the heck those are.

A lot of tutorials for git newbies start out explaining the Staging Area with some kind of metaphor so that it seems friendly or, I suspect, out of some subconscious wish to actually obscure it from Subversion converts so that git seems more familiar—more like SVN, which it is not very much like at all. I’m not going to really talk about it much. When we get to the commands that affect it (shortly, here), I’ll explain what they do. You can make the abstraction your self.

I’m intentionally writing this off the top of my head for two reasons: If I have to look up a command, then you might as well read whatever tutorial I looked it up on and if I have to look it up, then I clearly don’t use it all the time and thus, you don’t need to know it to get going on Git.

The Tutorial

I’ve got six sections to this thing with (I hope) at least vaguely descriptive names. They are:
  1. Setup
  2. Initial Commit
  3. SitRep
  4. Staging Area
  5. Remote Repo
  6. Conclusion/Links
Setup

You have a project you just started in a directory called ‘notes’. This isn’t even code, it’s just notes about something that you want to version control and back up. It’s a collection of text files and the directory structure is something like this.

$ pwd
~/notes/
$ ls
contact_info.txt  general.txt  outline.txt

After installing git as appropriate for your operating system, you start out by typing in the command line git init. This will create a directory called .git in notes/. There’s some stuff in there, but for the most part, you can ignore this for now. Suffice to say it’s where git does it’s book-keeping. What you’ve got now is a local git repository or, as the kids say, a “local repo”, but nothing’s in it.

Initial Commit

So you do a git add . (note the trailing period). This will toss everything (that’s what the period means) in notes/ into the staging area (including stuff that’s in directories that’re in directories that’re in notes/ etc.). The repo is still empty. To actual save stuff once it’s been staged, you do like this:

$ git commit -m 'Initial commit.'
[master (root-commit)]: created 7db8343: "Initial commit." 
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 contact_info.txt
 create mode 100644 general.txt
 create mode 100644 outline.txt

The -m option says you’re going to specify your commit message right after. Sometimes, you’ll want to leave a longer message, in which case, you forget the -m and git will automatically fire up a default text editor where you can put in longer stuff. Since a lot of that varies widely from OS to OS, I’m going to skip it and you can read more details on other tutorials (see below). Notice that you get a list of what’s changed (you created 3 new files in the repo) and you get your comment back in the output. Splendid.

SitRep

Now you’ve made your initial commit, and your stuff is in version control. Go into contact_info.txt and add something (doesn’t matter what for these purposes). Imagine you’ve made that change and then walked away and forgotten about it. You can use git status to see what’s new, thusly:

$ git status
# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   contact_info.txt
#
no changes added to commit (use "git add" and/or "git commit -a")

Using git status is just like a reminder. It doesn’t tell you much, but it can jog your memory about what you’ve already staged or what you changed and didn’t stage or what files you added. To get the real scoop about how a file changed, you use git diff. When you run git diff contact_info.txt the output will vary depending on what you had initially and what you added, but the gist is this: It will show you the changes (all of them) with a + before the line for additions and a – before the line for deletions. Generally, it gives a few lines before and after a change for context.

So let’s add our new contact_info change to the staging area and commit it, yeah? Do git add contact_info.txt and then git commit -m 'Updated contact info' or similar. Whatever comment you write is fine. Note we could’ve used git add . but I wanted to show the single-file syntax.

Staging Area

Now let’s put in some stuff into the outline.txt. Whatever you want. Just some stuff. Save it. But wait! We should also add some stuff to the general notes, just a quick overview at least, so put some stuff in there. We’ll finish the outline changes in a second. This is so much more pressing. Obviously.

Now, it’s good repo etiquette to only commit stuff atomically, which is to say that all the changes have to do with each other. Some people will say that you should only commit stuff that works (code compiles or whatever), but with git that’s less of a concern. I’ll come back to this point. What I’m getting at now is that you started one change and realized another needed to be made before you finished the first one. Now you want to commit only the second one, right? Simple: git add general.txt then git commit -m 'Added overview'. Because you never staged the outline (with your half-way-made changes), it doesn’t get committed. Later, if you need to revert that commit or whatever, you won’t have to worry that something else is mixed in there. Now, go ahead and finish your outline changes, and commit them. You should be able to do it on your own now.

Remote Repo

So, then… we’re version controlling this stuff. What if you want to get at it from another computer or let someone else get at it or… something? Pop on over to Git Hub which is my remote repo host of choice. There are others. Shop around, if you like. After you create an account, you can create a new remote repo called whatever you want. You’ll then be shown a page with some directions. Follow the ones under the heading “Existing Git Repo?”

The git remote add origin git@github.com:<username>/<project>.git command basically tells git where your remote repo is. You can have more than one if you like and, actually, do all sorts of crazy things with naming if you like, but I just want to handle the default, assumed case with this tutorial. One interesting thing: Github gives you two addresses for each repository (other hosts may do the same, I don’t know). The one that starts git@github.com is your read/write address and there’s one that starts git://github.com which is your read-only address. Since this is your own repo, you want to make sure to use the read/write address.

The git push origin master command is what actually moves your commits to the remote repo. This is where I recommend you adhere to the “only stuff that works” doctrine. If this is code, and you’re sharing the repo with your team or whatever, this is where they can get at it, so you don’t want to hand them broken stuff or half-finished ideas or whatever. So only push code that compiles/works. Pushing your code updates the remote repo with all the commits you’ve made since your last push.

The way you (or someone else) gets commits out of a repo is by using git pull. It takes the same arguments as git push. It will pull the commits down and then try to reconcile those changes with any that you’ve made since the last time your local repo was in the same state as the remote repo.

Conclusion/Links

I feel like this has gotten pretty long and I don’t want to put too much information all at once. That should be enough to get you started and, really, just try it out for a while and get comfortable with the basics. Don’t be afraid, if you get something out of whack and realize you’ve done something wrong, to kill your .git directory (which will delete the local repo) and start again from the top. I’ve intentionally left a lot of stuff out (like push/pull and branches and multiple remote repos can get kind of hairy), so here’s some documentation, blog posts and articles that I’ve found helpful. These are in no particular order and some are more advanced than others, so just start clicking and see what you like:

If you want to ask me about git or whatever, feel free to email me or leave something in the comments. Also, if you spot a mistake or something here doesn’t make sense, please let me know. Hope this is helpful to someone.