Consider the following German sentences:
(1)	a.	Marie glaubt,  dass Hans den Mann sah.
		Marie believes that Hans the man  saw
		'Marie believes that Hans saw the man'

	b.	Hans sah den Mann.
		Hans saw the man
		'Hans saw the man'

As you can see, the verb sah 'saw' is in different positions in the two examples; it's at the end in (1a), but not in (1b). This is sometimes described by saying that German has different word order in main clauses and embedded clauses.

That description isn't quite right, though. Consider the following pair:

(2)	a.	Hans sagte, er sei glücklich.
		Hans said   he is  happy
		'Hans said he is happy'

	b.	Hans sagte, dass er glücklich ist.
		Hans said   that he happy     is
		'Hans said that he is happy'

The verbs in the embedded clauses in (2) differ in their placement. Considering all the data in (1) and (2), it looks like we can't get away with a simple requirement that verbs must be final in embedded clauses and non-final in matrix clauses in German; that doesn't get the contrast in (2).

One general rule that does cover all the facts in (1-2) would go like this: the verb must be final in German when the word dass 'that' is present, and non-final if it is not. More generally, the verb must be final in clauses that contain complementizers:

(3)	a.	Hans fragt sich,  [ob      er glücklich ist].
		Hans asks  himself whether he happy     is
		'Hans wonders whether he is happy'

	b.	Hans singt, [weil    er glücklich ist]. 
		Hans sings   because he happy     is
		'Hans sings because he is happy'

Hans den Besten was the first to suggest that we ought to view these facts in terms of a movement operation; in particular, he suggested that the verb is moving into the position occupied by the complementizer, when the complementizer is not present. On this account, the German VP is head-final, but the verb raises into C when no C is pronounced:

This type of movement is a new one for us; it's called head-movement.

The tree in (4) won't get us quite the right word order for a sentence like (1b) above; to do that, we need to posit another movement, moving the DP Hans into the specifier of CP:

Plugging in some lexical items into the abstract tree in (5) gives us a way of diagramming sentences like (1b):

We'll hopefully have a chance to discuss the movement of the subject in this example more fully later; for now, it's probably worth noting that German does appear to allow just about anything to go in that pre-verbal position:

(7)	a.	Hans sah den Mann gestern.
		Hans saw the man  yesterday
		'Hans saw the man yesterday'

	b.	Den Mann sah Hans gestern.
		the man  saw Hans yesterday
	c.	Gestern   sah Hans den Mann.
		yesterday saw Hans the man

The examples in (7) are all grammatical, and roughly synonymous. So there does seem to be a very general operation in German that moves phrases into the specifier of CP. This phenomenon, in which the verb is the second thing in the clause, is called verb-second, or V2 for short (and linguists often refer to languages like German, which exhibit the V2 phenomenon, as "V2 languages").

(1b), then, is a V2 clause. The embedded clause of (1a), by contrast, is not V2, and would have a tree something like this one:

In (8), neither the head-movement of the verb to C nor the movement of the subject into the specifier of CP takes place; the presence of the overt complementizer apparently blocks the head-movement (and presumably has something to do with the impossibility of movement to the specifier).

We can get some additional evidence for head-movement by considering the behavior of a special class of verbs in German, which have what traditional grammar calls separable prefixes. The examples in (9) involve the verb anmachen 'turn on', which has the separable prefix an:

(9)	a.	Wir machten das Licht an.
		we  made    the light on
		'We turned on the light'

	b.	Marie glaubt,  dass wir das Licht anmachten
		Marie believes that we  the light on-made
		'Marie believes that we turned on the light'

Part of this verb, machten, is in second position in (9a) and in final position in (9b), for reasons that we've already figured out. The interesting thing is the separable prefix an, which always remains in final position. On the theory we're developing here, this is a natural place for the verb to leave pieces of itself behind; in (9a), when the verb moves to C, the separable prefix remains in V, which means that it shows up at the end of the sentence (where the verb would show up if it didn't move). The argument for movement here, in other words, is basically the same as the argument for A-movement based on Japanese numeral quantifiers; if we assume that the verb in (9a) actually starts in final position, we can account straightforwardly for the positioning of the separable prefix.

Now, German is like English in that some of its tenses are formed using auxiliaries:

(10)	a.	Marie glaubt,  dass wir den Mann sehen werden.
		Marie believes that we  the man  see   will
		'Marie believes that we will see the man'

	b.	Marie glaubt,  dass wir den Mann gesehen haben.
		Marie believes that we  the man  seen    have
		'Marie believes that we have seen the man'

The embedded clauses in (10) exhibit a word order that follows from the trees we have already drawn, as long as we generate these auxiliaries in T (or any other final head which c-commands VP):

Now consider what happens in clauses with auxiliaries when no overt complementizer is present:

(12)	a.	Wir werden den Mann sehen
		we  will   the man  see
		'We will see the man'

	b.	Wir haben den Mann gesehen
		we  have  the man  seen
		'We have seen the man'

In (12) we are seeing the effects of movement to C; the auxiliary, instead of being final, is in second position, following the regular V2 pattern:

We should consider an alternative to the tree in (13), however. We have already seen that the verb is capable, in principle, of moving to C (as it does in trees like the one in (6)). Why should we not move the V to C in examples like (12), yielding the tree in (14)?

In fact, the tree in (14) is apparently ruled out; the word order it generates, in (15), is ill-formed:

(15)	   *	Wir sehen den Mann werden
		we  see   the man  will
		'We will see the man'

So we need some principle to rule out the movement in (14). Lisa Travis developed such a principle, in her MIT dissertation:

Head Movement Constraint (HMC)
A head X may move to a head Y only if there is no head Z such that ZP
dominates X but not Y.

The HMC allows (13), but rules out (14); since TP dominates V, only T may move to C, and not V.

This result seems to cause problems for our tree in (6) above, however. We have seen that V may apparently move to C. Why is this possible in (6) but impossible in (14)?

We can solve this problem by allowing head-movement to be successive-cyclic, giving us the tree in (16) as a replacement for the one in (6):

The tree in (16) obeys the HMC; rather than moving V directly to C, we are moving V to v, which moves to T, which moves to C. Each of these smaller movements obeys the HMC, and the result is that the verb is pronounced in C. In fact, these movements might also give us a way of accounting for the fact that this verb is inflected for tense; by moving V through T, the verb 'picks up' the tense morphology which is presumably generated in T.

Not all languages are V2, of course, so we're next going to move on to look at head-movement in a couple of non-V2 languages--French and English. Before we do that, however, we should consider the question of what exactly is happening when a head moves. The successive-cyclic movement in (16) is sort of odd, in our experience so far; we've seen successive-cyclic movement before, but up until now the movement has just been of a single phrase which relocates several times. In the example in (16), if the HMC is to be obeyed, it has to be the case that each head is actually only moving once. V moves to v, and somehow attaches itself to it. Then v (along with V) moves to T, and those heads get attached to each other. And finally T (along with v and V) moves to C.

Our HMC, as stated, forces this way of looking at head-movement, because even after V moves to v, it shouldn't be able to move on to T; there is still a head (v, in this case) with a maximal projection (vP) which dominates V but not T. As a result, movement of V to T should be impossible. So how do we model this type of movement, in which movement results in some kind of glued-together object that moves as a unit?

Several answers to this question have been offered in the literature. For concreteness, we'll use one that was developed by Jonathan Bobaljik and Samuel Brown in a 1997 paper.

Optional Readings: Here's that 1997 paper:
Bobaljik, Jonathan, and Samuel Brown. 1997. Interarboreal operations: head movement and the extension requirement. Linguistic Inquiry 28.345-356.
Another recent approach to head-movement is:
Matushansky, Ora. 2006. Head movement in linguistic theory. Linguistic Inquiry 37.69-109.
And for a general overview of literature on head-movement, you could look at:
Roberts, Ian. 2001. Head-movement. In The Handbook of Contemporary Syntactic Theory, edited by Mark Baltin and Chris Collins. Oxford, Blackwell.

Let's consider the first instance of head-movement in the tree in (16), in which V moves to v. First we'll build the VP:

Next, we do several operations that should look very familiar, though we've never done them in quite this combination. We're going to draw a new lexical item from the lexicon, v. To give that lexical item something to Merge with, we're going to make a copy of a subpart of the tree we've already built; in particular, we're going to copy V, and Merge that with the v that we've just picked out of the lexicon. The head v will then project:

Now we can Merge these two trees that we've created together, projecting v again:

Next we'll Merge the subject in the specifier of vP:

And now we're ready to do the same trick that we did in (18-19). First, we'll draw an instance of T from the lexicon, and we'll make a copy of the complex head [v V], which we'll Merge to the new T, projecting T:

And again, we'll Merge the two trees we've just created, and project T again:

The DP Hans moves to the specifier of TP:

And, finally, we're ready for the last instance of head movement, moving T to C. First we'll select C from the lexicon, and make a copy of the complex head [T v V] to Merge to it, projecting C:

And now we Merge this complex head to the other tree, projecting C again (and putting C on the left of its complement, in this case):

Finally, movement of Hans to the specifier of CP completes the derivation:

As you might imagine, we won't bother to show this much detail about head-movement very often! But this is one way to make the operation fit into our understanding of how movement happens via Internal Merge.

Okay, now let's turn to how head-movement works in French.

Consider the following French sentences:

(27)	a.	Marie parle  souvent français
		Marie speaks often   French
		'Marie often speaks French'

	b. *	Marie souvent parle  français
		Marie often   speaks French
As you can see, an adverb like souvent 'often' goes between the verb and its object in French (as in (27a)), and cannot go where it would go in English (as in (27b)). Similarly, French allows negation to separate the verb from its object:
(28)	a.	Marie parle  pas français
		Marie speaks not French

	b.*	Marie pas parle  français
		Marie not speaks French
Our current approach to selection cannot straightforwardly handle the word orders in (27a) and (28a); if parle 'speaks' selects for an object, then that object should be the sister of the verb, and shouldn't be separable from it by adjuncts (like adverbs).

Of course, we could deal with this in various ways. We could revise our rules of selection, just for French (though this would be a fairly radical kind of parameter to propose). Or we could posit some type of movement of the object that takes place in French but not in English.

In fact, though, we have reason to believe that it's the verb that's moving in French, rather than the object. French infinitival verbs, it turns out, behave like their English counterparts:

(29)	a. [	Ne pas parler   français] rend  la  vie  difficile.
		ne not to-speak French    makes the life difficult
		'Not to speak French makes life difficult'

	b.*[	Ne parler   pas français] rend  la  vie  difficile.
		ne to-speak not French    makes the life difficult
So a finite verb in French can be separated from its object, but an infinitival verb can't be. This suggests that it's the verb that's responsible for the facts about the distribution of adverbs; in particular, that the French verb is undergoing head-movement, in tensed clauses but not in infinitives.

French is not German, however. None of these facts change if there is an overt complementizer:

(30)	a.	Je pense que  Marie parle  pas français
		I  think that Marie speaks not French
		'I think Marie doesn't speak French'

	b. *	Je pense que  Marie pas parle  français
		I  think that Marie not speaks French
French is unlike German, then, in moving the verb to T rather than to C.

Finally, we get to English.

Let's start by considering how English forms yes-no questions. In particular, let's concentrate on how these questions are formed from statements with auxiliaries in them:

(31)	a.	Mary will eat
	b.	Will Mary eat?

(32)	a.	Mary is eating
	b.	Is Mary eating?

(33)	a.	Mary has eaten
	b.	Has Mary eaten?

(34)	a.	The chocolate was eaten
	b.	Was the chocolate eaten?
Data like the ones in (31-34) suggest a pattern; yes-no questions are formed by fronting of the auxiliary. This never happens when there's an overt complementizer:
(35)	a.	I wonder whether Mary will eat
	b. *	I wonder whether will Mary eat
This little corner of English, then, seems to resemble German; we see head-movement of T into C, just when C isn't occupied by a complementizer:

In English, this head-movement is confined to questions, while in German we find it quite generally.

Actually, there are a number of processes that single out whatever's in T:

(37)	a.	Mary will not eat the nattoo.	[negation]
	b.	Mary will eat the nattoo.	[verum focus]
	c.	Will Mary eat the nattoo?	[T-to-C]
English negation follows the auxiliary in T, as we see in (37a). And English puts extra stress on the auxiliary in T to mark what's known as verum focus, which is the type of focus that emphasizes the truth of what's being asserted (it's the type of sentence you'd use to contradict someone who's just asserted (37a), for example). And, as we've seen, T moves to C in English to mark questions, as long as there's nothing already in C.

Now, of course, there are English sentences without auxiliaries:

(38) Mary eats nattoo.

Here the verb has a suffix -s, which is in complementary distribution with auxiliaries like the ones in (31-34). As so often, it's tempting to analyze complementary distribution as signalling that the complementarily distributed things are actually generated in the same place; in this case, we would want to say that -s is generated in T, just like the auxiliaries.

One of Noam Chomsky's big insights in his 1957 book Syntactic Structures was that, as far as phenomena like the ones in (37) are concerned, -s acts like the other inhabitants of T:

(39)	a.	Mary does not eat nattoo.	[negation]
	b.	Mary does eat nattoo.		[verum focus]
	c.	Does Mary eat nattoo?		[T-to-C]
Just like will, -s is followed by negation (as in (39a)), and is stressed to indicate verum focus (as in (39b)), and moves to C in questions (as in (39c)).

If we just stopped the story there, then we would end up with trees for questions like (39c) like the one in (40), presumably pronounced as (41):

(41) s Mary eat nattoo?

This is not quite right. In fact, being a bound morpheme, -s can't be pronounced all by its own, like this; it needs to attach to something. In (39c), we apparently insert a dummy verb do, which is there to give -s something to attach to. This process is known as do-support.

Do-support doesn't appear in every English sentence, as we already know from looking at (38) above. What's the tree for (38)?

(43) Mary -s eat nattoo.

Again, not exactly what we're after. And again, it seems reasonable to blame that fact on the affixal status of -s. In this example, the -s, which we have reason to believe is generated in T, is in fact showing up on V.

We can't handle this problem by positing movement of the verb to T. That was the whole point of the difference between English and French; the French verb moves to T, which is why various things can get between the French verb and its object. But the English verb and its object have to be adjacent, for reasons we think we understand--and we can handle that fact as long as we don't allow the verb to move out of the VP.

So apparently we need to posit a process that moves the affix -s down onto the verb. This process is known as Affix Hopping:

Affix Hopping
If an affix is linearly adjacent to the verb, attach the affix to the verb.

Affix Hopping is a very bizarre-looking syntactic rule; it involves movement downwards, and it makes direct reference to linear adjacency. But we seem to need it. One popular attitude towards it suggests that it's not in fact a syntactic rule at all; it's something that happens after the syntactic derivation is over. We already know that there are purely phonological or morphological processes that aren't handled by the syntax (for example, the Tagalog infix -um- presumably isn't put in place by the rules of syntax), and the proposal is that Affix Hopping is one of those processes.

Do-support, on this view, is an operation that saves affixes at the last minute if they cannot be saved by Affix Hopping. In examples like the ones in (39), on that view, Affix Hopping is blocked from applying, and do-support is invoked to save the day--again, perhaps not within the syntax at all.

Now, remember the HMC? We saw that it constrains head-movement in German, preventing lower heads from moving past higher heads. The HMC is active in English, as well:

(44)	a.	Will Mary __ eat nattoo?
b. * Eat Mary will __ nattoo?
If you're going to move something into C, in other words, it had better be T (as in (44a), not V (as in (44b)).