Introduction, part two: what's syntax?

The first thing we will need to do, in order to study syntax, is to try to pick out which aspects of language we are studying. This is probably impossible to do completely until we develop a final complete theory of syntax, and of the other domains that syntax interacts with, but at least we can make some preliminary distinctions.

One of the things we are trying to do, in syntax, is develop a set of rules that will predict which sentences are grammatical and which ones are ungrammatical in a given language. Having done that, we can start to try to understand what types of syntactic rules are possible ones in human languages. So what does "grammatical" mean?

Syntax vs. semantics

To begin with, "grammatical" does not mean "meaningful". This is the point of Chomsky's famous pair of examples:

Colorless green ideas sleep furiously.
Furiously sleep ideas green colorless.

Neither of these sentences is "meaningful", but the second has an additional problem that the first lacks; in the first sentence, although the words don't mean anything, they at least seem to be in the right order. In other words, the first sentence, though meaningless, is "grammatical", in that the words are put together according to the rules for constructing English sentences. The second sentence, by contrast, is both ungrammatical and meaningless.

Prescriptive vs. descriptive

"Grammatical" also does not mean "would be accepted by your high school English teacher" We're not in the business of trying to figure out how people "ought" to speak; we want to know how they actually do speak. So, for example, your high school English teacher may have tried to persuade you not to say, or at least write, things like:

What are you talking about?

That is, you may have been told not to "end a sentence with a preposition" (for some discussion of why you were told this, look here). But in fact, your English teacher had to tell you this because English speakers routinely do end sentences with prepositions; if they didn't, there would be no point in trying to get you to stop doing it. Your English teacher didn't, for example, try to get you to stop saying "furiously sleep ideas green colorless", because that's actually an ungrammatical sentence, which no English speaker would say. The ban on ending sentences with prepositions is what we call a prescriptive rule, something which educators try to impose on speakers who aren't naturally inclined to obey it. This isn't what we'll be studying.

Competence vs. performance

When you're speaking, all kinds of things might happen to you that would affect what you say. If somebody interrupts me, or I inhale a fly, or forget what I was going to say, or whatever, I might find myself having said something like:

This is the

and then never finishing the sentence. In principle, we could declare that since I said it, "This is the" must be an English sentence that we want our grammar to account for (maybe with a footnote to the effect that it's only grammatical if you inhale a fly after saying it). But it seems more reasonable to say that our grammar should only produce complete sentences, and that this grammar interacts with facts about real life in a way that sometimes produces utterances like "this is the". This is the distinction between competence (what the grammar would produce in a perfect world) and performance (actual linguistic behavior, the result of the grammar interacting with interruptions, limited memory, fatigue, inhaled flies, etc.).

In some cases, it's easy to figure out which aspects of performance to ascribe to competence, and which ones are the result of other factors. In others, it's a little trickier. For instance, sentences like this one are particularly hard to understand:

The cat the woman the boy described owned died.

But many syntacticians agree that sentences like this ought to be generated by the grammar, and then ruled out by performance factors (in this case, conditions on how you actually build the structure of the sentence as you hear it, sometimes called parsing or processing; more about that later). This sentence gets slightly easier to understand if you break it into parts. First, there's the main clause, which is easy to understand:

The cat died.

Then, we're modifying the subject with a relative clause:

The cat (that) the woman owned died.

What makes the sentence hard to understand is that we then add another relative clause, this time modifying the subject of the relative clause that we just added:

The cat (that) the woman (that) the boy described owned died.

It'll be a while before we talk seriously about relative clauses, but the only point here is that it's hard to imagine a condition on their formation that would rule this sentence out. In principle, we ought to be able to add a relative clause to any NP, as far as the grammar is concerned. So we're going to chalk up the ill-formedness of this sentence to something else; in this case, a difficulty with processing (which we won't try to explain, at least not yet).

Another domain where we can see the effects of performance factors has to do with sentence length. It's pretty clear that there is no arbitrary upper bound on the length of a sentence, as far as the grammar is concerned. To see that, take whatever you think might be the longest sentence in the world, and add She said that to the beginning of it; now you have a new 'longest sentence'. So there is in fact no real 'longest sentence'; any sentence can always be made a little longer. To pick another example, if we're talking about genealogy and describing relatives of various kinds (my father's sister, his brother's wife's daughter's daughter, etc.), there is no upper bound on the length of the terms we can use; again, pick your favorite candidate for the longest term you could use to describe a relative, and add, say, mother's to get a longer term (my mother's father's sister, his mother's brother's wife's daughter's daughter, etc.). So as far as the grammar is concerned, your utterances are potentially infinite in length. You're just prevented from actually uttering infinite sentences by performance factors (you need to eat, people will stop paying attention to you, you will eventually die, etc.). But these performance factors are facts about life, not about syntax.

Syntax vs. processing

There's one last distinction we need to make, and then we can start doing syntax. Let's imagine that we were studying, for example, vision. If we were doing that, you can imagine two kinds of (interrelated) areas of study. In one of them, you would be studying the content of the visual representation in a particular organism (human, fly, dog, whatever). So you'd be trying to find out what wavelengths of light register on that organism's visual system, how far it can see, how its vision is affected by movement of the things being seen, etc. In another area of study, you would be trying to find out how visual information is acquired and manipulated by the brain. So you'd be looking at the anatomy of the eye, using something like eye tracking to see how the eye gathers data, learning about how the information is fed to the brain and which parts of the brain handle which parts of the visual representation, and so forth. Of course, those fields of study are related, but you could imagine studying one of them while largely ignoring the other. For example, you could learn about how visual information is fed to the visual cortex without worrying too much about the exact content of the information; you'd just be figuring out, say, which neurons are firing, and in what order.

We can draw a similar distinction in linguistics. One question, the one we'll be concerned with in this class, is about the nature of language itself; what are the linguistic objects that we're creating and interpreting? And another question, which we won't be concerned with much, is about how these linguistic objects are created in real time; which parts of the brain are involved, and what are the strategies you use to create linguistic representations as you hear and utter sentences? This second question is the domain of processing. Just as in the case of vision, syntax and processing are raising interrelated questions, and the two fields have a lot to teach each other. But in principle, we can proceed by concentrating just on one field, and that's what we're going to do in this class.

The distinction between syntax and processing can be illustrated in another way. Suppose we're developing a theory of processing; so we want to know how sentences are constructed. Okay, what are 'sentences'? Well, that turns out to be a difficult question. A sentence isn't just any string of words, and we've also seen that 'sentence' can't just refer to meaningful strings of words, or to strings of words that are easy to understand. So what are they? That's the subject matter of syntax.

The distinction between syntax and processing is an important one to bear in mind, because it's something people often get confused about. It's easy to get confused about it, because we clearly need to say that syntax is generative; that is, syntax needs to contain general rules that govern the construction of sentences. Our theory of syntax can't, for example, be a list of all the sentences of a language, because (as we've just seen) there's no upper bound on the length of these sentences, so the list would have to be potentially infinite in length. So what we need is a finite list of rules that interact to produce these sentences of potentially infinite length. Figuring out what these rules are will be one of our main tasks in this course.

Now, of course, it could turn out that we'll be lucky, and these rules that we need to describe the form of language are in fact the rules that are used by the brain to create and interpret the sentences in real time. But that's not logically necessary. So we're not going to commit ourselves, in this class, to the task of creating rules that are useful both for syntax and for processing; we'll content ourselves, for now, with handling syntax (which is difficult enough to deal with).