This Friday class is required for everyone.
#### Software in 6.005
Safe from bugs | Easy to understand | Ready for change |
Correct today and correct in the unknown future.
|
Communicating clearly with future programmers, including future you.
|
Designed to accommodate change without rewriting.
|
#### Objectives
+ In this reading, learn what version control is and why we use it
+ In class, practice understanding, creating, and using version history
## Introduction
[Version control systems](http://en.wikipedia.org/wiki/Revision_control) are essential tools of the software engineering world.
More or less every project --- serious or hobby, open source or proprietary --- uses version control.
Without version control, coordinating a team of programmers all editing the same project's code will reach pull-out-your-hair levels of aggravation.
### Version control systems you've already used
+ Dropbox
+ [Undo/redo buffer](http://en.wikipedia.org/wiki/Undo)
+ Keeping multiple copies of files with version numbers
|
|
|
|
|
|
Project Report |
Project Report v2 |
Project Report v3 |
Project Report final |
Project Report final-v2 |
Project Report final-v2-fix-part-5 |
## Inventing version control
Suppose [Alice](http://en.wikipedia.org/wiki/Alice_and_Bob) is working on a pset by herself.
She starts with one file `A.java` in her pset, which she works on for several days.
At the last minute before she needs to hand in her pset to be graded, she realizes she has made a change that breaks everything.
If only she could go back in time and retrieve a past version!
A simple discipline of saving backup files would get the job done.
Alice uses her judgment to decide when she has reached some milestone that justifies saving the code.
She saves the versions of `A.java` as `A.1.java`, `A.2.java`, and `A.java`.
She follows the convention that the most recent version is just `A.java` to avoid confusing Eclipse.
We will call the most recent version the *head*.
Now when Alice realizes that version 3 is fatally flawed, she can just copy version 2 back into the location for her current code.
Disaster averted!
But what if version 3 included some changes that were good and some that were bad?
Alice can compare the files manually to find the changes, and sort them into good and bad changes.
Then she can copy the good changes into version 2.
This is a lot of work, and it's easy for the human eye to miss changes.
Luckily, there are standard software tools for comparing text; in the UNIX world, one such tool is [`diff`](http://en.wikipedia.org/wiki/Diff).
A better version control system will make diffs easy to generate.
Alice also wants to be prepared in case her laptop gets run over by a bus, so she saves a backup of her work in the cloud, uploading the contents of her working directory whenever she's satisfied with its contents.
If her laptop is kicked into the Charles, Alice can retrieve the backup and resume work on the pset on a fresh machine, retaining the ability to time-travel back to old versions at will.
Furthermore, she can develop her pset on multiple machines, using the cloud provider as a common interchange point.
Alice makes some changes on her laptop and uploads them to the cloud.
Then she downloads onto her desktop machine at home, does some more work, and uploads the improved code (complete with old file versions) back to the cloud.
If Alice isn't careful, though, she can run into trouble with this approach.
Imagine that she starts editing `A.java` to create "version 5" on her laptop.
Then she gets distracted and forgets about her changes.
Later, she starts working on a new "version 5" on her desktop machine, including *different* improvements.
We'll call these versions "5L" and "5D," for "laptop" and "desktop."
When it comes time to upload changes to the cloud, there is an opportunity for a mishap!
Alice might copy all her local files into the cloud, causing it to contain version 5D only. Later Alice syncs from the cloud to her laptop, potentially overwriting version 5L, losing the worthwhile changes.
What Alice really wants above is a *merge*, to create a new version based on the two version 5's.
OK, considering just the scenario of one programmer working alone, we already have a list of operations that should be supported by a version control scheme:
+ *reverting* to a past version
+ *comparing* two different versions
+ *pushing* full version history to another location
+ *pulling* history back from that location
+ *merging* versions that are offshoots of the same earlier version
### Multiple developers
Now let's add into the picture Bob, another developer.
The picture isn't too different from what we were just thinking about.
Alice and Bob here are like the two Alices working on different computers.
They no longer share a brain, which makes it even more important to follow a strict discipline in pushing to and pulling from the shared cloud server.
The two programmers must coordinate on a scheme for coming up with version numbers.
Ideally, the scheme allows us to assign clear names to *whole sets of files*, not just individual files.
(Files depend on other files, so thinking about them in isolation allows inconsistencies.)
Merely uploading new source files is not a very good way to communicate to others the high-level idea of a set of changes.
So let's add a log that records for each version *who* wrote it, *when* it was finalized, and *what* the changes were, in the form of a short human-authored message.
Pushing another version now gets a bit more complicated, as we need to merge the logs.
This is easier to do than for Java files, since logs have a simpler structure -- but without tool support, Alice and Bob will need to do it manually!
We also want to enforce consistency between the logs and the actual sets of available files: for each log entry, it should be easy to extract the complete set of files that were current at the time the entry was made.
But with logs, all sorts of useful operations are enabled.
We can look at the log for just a particular file: a view of the log restricted to those changes that involved modifying some file.
We can also use the log to figure out which change contributed each line of code, or, even better, which person contributed each line, so we know who to complain to when the code doesn't work.
This sort of operation would be tedious to do manually; the automated operation in version control systems is called *annotate* or *blame*.
### Multiple branches
It sometimes makes sense for a subset of the developers to go off and work on a *branch*, a parallel code universe for, say, experimenting with a new feature.
The other developers don't want to pull in the new feature until it is done, even if several coordinated versions are created in the meantime.
Even a single developer can find it useful to create a branch, for the same reasons that Alice was originally using the cloud server despite working alone.
In general, it will be useful to have many shared places for exchanging project state.
There may be multiple branch locations at once, each shared by several programmers.
With the right set-up, any programmer can pull from or push to any location, creating serious flexibility in cooperation patterns.
### The shocking conclusion
Of course, it turns out we haven't invented anything here:
[Git] does all these things for you, and so do many other version control systems.
### Distributed vs. centralized
Traditional *centralized* version control systems like CVS and [Subversion](http://subversion.apache.org/) do a subset of the things we've imagined above.
They support a collaboration graph -- who's sharing what changes with who -- with one master server and copies that only communicate with the master.
In a centralized system, everyone must share their work to and from the master repository, and a change is only *in version control* if it's *in the master repository*.
In contrast, *distributed* version control systems like [Git] and Mercurial allow all sorts of different collaboration graphs, where teams and subsets of teams can experiment easily with alternate versions of code & history, merging versions together as they are determined to be good ideas.
In a distributed system, all repositories are created equal, and it's up to users to assign them different roles.
Different users might share their work to and from different repos, and the team must decide what it means for a change to be *in version control*.
If it's in... any repo?
Or a certain special repo?
### Version control terminology
+ **Repository**: a local or remote store of the versions in our project
+ **Working copy**: a local, editable copy of our project that we can work on
+ **File**: a single file in our project
+ **Version** or **revision**: a record of the contents of our project at a point in time
+ **Change** or **diff**: the difference between two versions
+ **Head**: the current version
## Features of a version control system
+ **Reliable**: keep versions around for as long as we need them; allow backups
+ **Multiple files**: track versions of a project, not single files
+ **Meaningful versions**: what were the changes, why where they made?
+ **Revert**: restore old versions, in whole or in part
+ **Compare versions**
+ **Review history**: for the whole project or individual files
+ **Not just for code**: prose, images, ...
It should **allow multiple people to work together**:
+ **Merge**: combine versions that diverged from a common previous version
+ **Track responsibility**: who made that change, who touched that line of code?
+ **Work in parallel**: allow one programmer to work on their own for a while (without giving up version control)
+ **Work-in-progress**: allow multiple programmers to share unfinished work (without disrupting others, without giving up version control)
mitx:81311e134df846f7a532d1f33fb59c75 Merging
## Version control in 6.005
[Git]: http://git-scm.com
The version control system we'll use in 6.005 is [Git].
It's powerful and worth learning.
But Git's user interface can be terribly frustrating.
What is Git's user interface?
+ **In 6.005, we will use Git on the command line.**
The command line is a fact of life, ubiquitous because it is so powerful.
+ The command line can make it very difficult to see what is going on in your repositories.
You may find [SourceTree](http://www.sourcetreeapp.com) (shown on the right) for Mac & Windows useful.
On any platform, [gitk](http://git-scm.com/docs/gitk) can give you a basic Git GUI.
Ask Google for other suggestions.
An important note about tools for Git:
+ Eclipse has built-in support for Git.
If you follow the [problem set instructions](http://web.mit.edu/6.005/www/fa14/psets/ps0/), Eclipse will know your project is in Git and will show you helpful icons.
But because Eclipse Git support is buggy, if you use the Eclipse Git UI to make changes, commit, etc., course staff may not be able to help you.
+ [GitHub](http://github.com/) makes desktop apps for Mac and Windows.
Because the GitHub app tries to change how some Git operations work, if you use the GitHub app, course staff may not be able to help you.
### Git
On the [Git] website, you can find two particularly useful resources:
+ [*Pro Git*](http://git-scm.com/book) documents everything you might need to know about Git.
+ The [Git command reference](http://git-scm.com/docs) can help with the syntax of Git commands.
You should already have read the **[PS0 instructions]** and the **[Getting Started with Git]** tutorial, which describe basic Git workflows to follow.
[PS0 instructions]: http://web.mit.edu/6.005/www/fa14/psets/ps0/#clone
[Getting Started with Git]: http://web.mit.edu/6.005/www/fa14/tutorial/git/
mitx:fdfc27ef592a461da0ba99e4ffb6abb3 Git
## Version control and the big three
Safe from bugs
: find when and where something broke
look for other, similar mistakes
gain confidence that code hasn't changed accidentally
Easy to understand
: why was a change made?
what else was changed at the same time?
who can I ask about this code?
Ready for change
: all about managing and organizing changes
accept and integrate changes from other developers
isolate speculative work on branches