Getting Started with Git

# Getting Started with Git

This document is a work in progress. Use Piazza to suggest further Git tips and tricks we might include.

The purpose of this handout is to give you an idea of how to effectively use Git. The emphasis here is on "effectively"---the goal is to get you to think of Git as a tool that is actually useful (both in 6.005 and maybe even in your other projects), and not just a cumbersome series of magical incantations. This document is _not_ intended to be a stand-alone introduction to Git. We expect you to have played around with it a little bit and have some idea of what it looks and feels like. # A bit of context ## The command-line One of the things that makes learning Git hard for many students is that it's a command-line program. If you're not familiar with the command-line, this can be confusing, especially because it's hard to understand what's specific to Git and what's not. A command-line is just an interface to your computer, totally analogous to Finder or Windows Explorer, except that it's text-based. As the name implies, you interact with it through "commands"---each line of input begins with a command and might have one or more arguments, all separated by spaces. The command-line keeps track of what directory (folder) you're in, which is important to many of the commands you might be running. Here are some common ones: * `cd directory-name` (stands for "change directory") --- Switches to the directory `directory-name`. * `pwd` (Stands for "print working directory") --- Prints out the current directory. * `ls` --- Lists the files in the current directory ### Arguments Most commands you type in are actually other programs. When those programs get launched, the command-line passes in the arguments to the program so that it can do something with them. In Java, this is what gets stored in the `String[] args` argument passed to the `main()` method of your program. ## Working locally Git is also just another program, so when you type in something like `git subcommand whatever...`, the `subcommand whatever` part gets sent to Git, the program. Git, in turn, just manipulates a directory in your repository called `.git` (you can't normally see it when you run `ls` because it's "hidden"; if you say `ls -a` at the top level of your repository, it will appear). ### Configuration Before you go on, it's a good idea to configure Git to be a little bit nicer. #### Who are you? Every Git commit has an author, the name and email address of the person who wrote the code. Especially when working on a team project, you should make sure your Git commits include your correct name and email. git config --global user.name "Your Name" git config --global user.email username@mit.edu #### Commit Messages or, "I do git commit and then I can't type!" When you run "git commit," you will be presented with a text editor that lets you edit the contents of the commit. Unfortunately, Git may choose a default text editor that is unexpected and unintuitive. Before making your first commit, try running nano in the terminal. The result should be a simple editor with instructions at the bottom of the screen; quit with `ctrl-X`. If that worked, git config --global core.editor nano will configure Git to use the nano editor. The commands to use the text editor (like copy, paste, quit, etc.) will be shown on the bottom of the screen. The `^` symbol represents the `ctrl` key. For example, you can press `ctrl-O` to save (Nano calls it "write out") and then `ctrl-X` to quit. #### Adding some color Out of the box, it can be hard to see and understand all the output that git prints out at you. One way to make it a little easier is to add some color. You can run the following commands to make your git output colorful: git config --global color.branch auto git config --global color.diff auto git config --global color.interactive auto git config --global color.status auto git config --global color.grep auto ### Basic workflow The basic building block of data in Git is called a "commit". A commit represents some change to one or more files (or the creation of one or more files). When you first create a file or change a file, that data is unknown. To add it, run `git add file.txt` (where file.txt is the file you want to add) This "stages" the file. Once you've staged all your changes, run `git commit` This will pop up an editor that will give you a chance to write a _commit message_. When you save and close the editor, the commit will be created. ### Getting the status of your repository Git has some nice commands for seeing the status of your repository. The most basic of these is `git status`. You can run this at any point to see which files Git sees have been modified and are still unstaged and which files have been modified and staged (so that if you `git commit` those changes will be included in the commit). Note that the same file might have both staged and unstaged changes, if you changed the file more after running `git add`. When you have unstaged changes, you can see what the changes were (relative to the last commit) by running `git diff`. Note that this will _not_ include changes that were staged (but not committed). You can see those if you run `git diff --staged`. You can see what the last commit actually was with `git show`. This will show you the commit message as well as all the modifications (as if you had run `git diff`). You can see the list of all the commits you made (along with their commit messages) with `git log`. If you do `git log -p`, it will show you the full commit history, including the changes each commit made. In other words, this is as if you ran `git show` on each commit in your history. Note that `git show` and `git log` might place your command-line in a state where you can't type more commands. Instead, there will be a little colon (:) symbol at the bottom. This indicates that there is more data than there was room on your screen and that you can scroll with the arrow keys. You can leave this mode by pressing `q`. ### Commit IDs Every Git commit has a unique ID, which is the long string of letters and numbers that you see when you type `git log` or `git show`. This is what's called a "hash" of the contents of your commit. One neat feature is that this ID is unique not just within your repository, but actually within the _universe_ of Git commits. In other words, if your commit ID is something like `ab1312313febc241...`, that commit is (extremely likely) to be the _only_ commit in the world with that name. You can reference a commit by its ID (or frequently just by the first 8 characters). This is most useful with something like `git show`, where you can look at a particular commit, rather than just the most recent one. ## Working remotely So far, all the commands we've been running have only been operating _locally_; that is, they haven't gone past your computer. This is still pretty useful, but sometimes you want to go further. ### Remotes Unlike other similar systems, Git doesn't have built-in a notion of a "central repository." Instead, any repository can push to any other repository by specifying it as a "remote." A "remote" is just a pair of a name (which can be anything) and a URI, which is a string indicating how it can find the other repository. The URI might look something like this: `ssh://username@athena.dialup.mit.edu/afs/athena.mit.edu/course/6/6.005/git/sp13/psets/ps0/username.git` Breaking that down: * `ssh://` --- this specifies the _protocol_ git should use to transfer the data. SSH is a protocol that lets you send data securely, which is useful to us because we have to type in a password. But in principle this is totally analogous to, for example, the http:// which you see in web browsers (HTTP is a protocol commonly used for data on the Web). * `username@athena.dialup.mit.edu` --- this actually has two parts. The `username` is the username you use to log in to the server. The `athena.dialup.mit.edu` is the address of the server itself. `athena.dialup.mit.edu` is the name of an Athena server IS&T runs. It accepts Kerberos logins, so your `username` can just be your Kerberos name. * `/afs/athena.mit.edu/course/6/6.005/git/sp13/psets/ps0/username.git` --- this is the path on the server where the repository is stored. The only noteworthy thing here is the `username.git` part at the end. In 6.005, for our convenience, we specify your repository with your username. This doesn't have to be the case, though, and isn't always---for example, in your projects, the name is actually the usernames of all three members of your group. Perhaps most importantly, there's nothing in Git to say that the username you log in with (the thing before the `@` sign) and the username at the end of the path have to match. In 6.005, we just set it up that way. Now, even thoguh Git doesn't have the idea of a central repository, it's very useful for 6.005. Thus, in 6.005, all of your repositories are actually created by _cloning_ a remote repository which we create (and which acts as the "central" repository). You've done this with the `git clone URI directory` command a bunch of times now. This actually does a couple of things: 1. Create an empty directory called `directory` (i.e. the last argument to `git clone`). 2. Initialize it as an empty Git repository. 3. Add a remote with the URI you specified and the name `origin`. 4. Download the data from the remote. So for those of you who were wondering, that's what the `origin` means. It's just the default name of the remote repository that you cloned your repository from. ### Pushing After you've made some commits, you might want to push them to a remote repository. Again, in 6.005, you really only have one remote repository to push to, called `origin`. To push to it, you run the command `git push origin master` The `origin` in the command specifies that you're pushing to the `origin` remote. The `master` refers to the `master` branch. Branches are an advanced feature of Git that we're not going to be using in 6.005, but since Git has them, you do have to specify a branch. For now, just include this part when you push. Once you run this, you will be prompted for your password and hopefully everything will push. You'll get a line like this: `a67cc45..b4db9b0 master -> master` Sometimes, though, things will go wrong. You might get an output like this: `! [rejected] master -> master (non-fast-forward)` What's going on here is that Git won't let you push to a repository unless all your commits come after all the ones already in your remote repository. If you get an error message like that, it means that there is a commit in your remote repository that you don't have in your local one (probably because a teammate pushed before you did). If you find yourself in this situation, you have to pull first and then push. ### Pulling To perform a pull, you should run `git pull origin` (again, the `origin` tells Git that you're pulling from the `origin` remote). When you run this, Git actually does two things: 1. It downloads the changes and stores them in its internal state. At this point, your repository doesn't appear any different---it just knows what the state of the remote repository is and what the state of your repository is. 2. It incorporates the changes from the remote repository into the new repository via a process called _merging_. #### Merging If you made some changes to your repository and you're trying to incorporate the changes from another repository, you need to merge them together somehow. In terms of commits, what actually needs to happen is that you have to create a special _merge_ commit which encompasses both changes. How this process actually happens depends on the changes. If you're lucky, then the changes you made and the changes that you downloaded from the remote repository don't conflict. For example, maybe you changed one file and your partner changed another. In this case, it's safe to just include both changes. Similarly, maybe you changed different functions of the same file. In these cases, Git can do the merge automatically. When you run `git pull`, it will pop up an editor as if you were making a commit---in fact, this is the commit message of the merge commit that Git automatically generated. Once you save and close this editor, the merge commit will be made and you will have incorporated the changes. At this point, you can try to `git push` again and hopefully it will work this time. Sometimes, you're not so lucky. If the changes you made and the changes you pulled edit the same part of the same file, Git won't know how to resolve it. This is called a _merge conflict_. In this case, you will get an output that says `CONFLICT` in big letters. If you run `git status`, it will show the conflicting files with the label `Both modified`. You now have to edit these files and resolve them by hand. First, open them up in your text editor (probably Eclipse for 6.005). The parts that are conflicted will be really obviously marked with obnoxious <<<<<<<<<<<<<<<<<<, ==================, >>>>>>>>>>>>>>>>>> lines. Everything between the <<<< and the ==== lines are the commits you made. Everything between the ==== and the >>>> lines are the commits you pulled in. It's your job to figure out how to combine these. The answer will of course depend on the situation. Maybe one change logically supercedes the other, or maybe they can be merged somehow. You should edit the file to your satisfaction and remove the <<<>>> markers when you're done. Once you have resolved all the conflicts (note that there can be several conflicting files, and also several conflicts per file), `git add` all the affected files and then `git commit`. You will have an opportunity to write the merge commit message (where you should describe how you did the merge). Now you should be able to push. ### Big caveat: pulling without committing! One thing you should be very careful about is to commit all your changes before doing a `git pull`. If you don't do this, what's going to happen is that Git will download all the files, but then refuse to try to do a merge because it's worried about overwriting your changes. If you make a commit and then try `git pull` again, it might say `Already up to date` even though the changes haven't been incorporated. If you accidentally run into this situation, `git merge master` will force the merging process to happen. ## Errors ### No repository If `git clone ssh://...` reports that it "could not read from remote repository", check your repository URL for typos. If you are sure you have the correct URL, and especially if you registered late, contact the staff to make sure a repository has been created for you. ### Can't clone Because your origin Git repository is stored on Athena and accessed with SSH, certain Athena customizations can conflict with Git's ability to clone and push. If running `git clone ssh://...` reports a "protocol error" or simply asks for your password but then hangs and does nothing, you should review any changes you made to your Athena dotfiles, especially `.bashrc.mine` and `.bash_environment`. Ask a TA for help if you are not familiar with Athena. ## Verifying that your code is on Athena - Use `git status`, which will report versions you have not pushed to Athena as "commits ahead of origin/master". - Use `git log` to review the versions you have committed. - If Didit ran a build for a version, that version is on Athena. - In your clone, run `git log --decorate` and look for a version labeled `origin/master`; that's the version on Athena, as far as your clone is aware. - This one is not a commonly-used command, but in your clone, run `git ls-remote` to connect to Athena and output the current version there. ---- ### Have fun in 6.005!