One of the things we notice, as we study languages, is that they often appear to be too complicated to learn. In particular, there seem to be cases in which the evidence available to a child learning a language wouldn't be enough to give them the right answers to certain kinds of questions; there ought to be no way for them to figure out the right way to say certain kinds of things.

In some cases, the consequence of this poverty of information is that learners just make a guess. And as you might expect, different learners make different guesses, with the result that different learners end up with slightly different ways of speaking. One instance of this kind of situation comes from Tagalog facts that were discussed by Kie Zuraw in her Master's thesis (among other places).

Tagalog forms past tenses of certain verbs by infixing -um-:

verb stem   past-tense form   meaning
lipad   lumipad   'fly'
kanta   kumanta   'sing'
sayaw   sumayaw   'dance'
punta   pumunta   'go'

As we said in class, there are (at least) two theories of the placement of the infix -um- that are consistent with all of the data in this table:

Either of these theories will get all the Tagalog facts we've seen so far. And in fact, both theories are compatible with almost all the Tagalog data. One place where they differ would be in words starting with more than one consonant. As it happens, there aren't any words like this in the native Tagalog vocabulary; Tagalog originally didn't allow words to begin with multiple consonants. But through contact with various other languages, mainly Spanish and English, Tagalog has acquired a few words beginning with consonant clusters, and some of these can have -um infixed in them:

verb stem   past-tense form   meaning
gradwet   gumradwet, grumadwet   'graduate'
plahiyo   pumlahiyo, plumahiyo   'plagiarize'

As you can see from the table, Tagalog speakers who are confronted with this type of verb simply guess; both of the imaginable patterns of um-infixation are attested.

In this Tagalog example, then, most of the available data (all the native vocabulary) doesn't help the speakers figure out which of the two imaginable rules for infixation is the right one. And what we seem to find is that, in fact, speakers settle on one or another of the rules more or less at random. That's what you'd expect, really; if there's no way to choose, speakers just have to guess, so they can get on with their lives.

But we also seem to find cases that are very different from the Tagalog one we've just looked at. These are cases in which, again, speakers don't have enough information to determine which of several possible grammars is the right one, and yet, they don't guess--they all pick the same one. One example of this type comes from Bulgarian.

Bulgarian and English have slightly different ways of forming questions. Let's concentrate on English first. English has a type of question, called 'wh-questions', which we'll be talking about in depth later in the semester. These are questions to which the answer is something other than 'yes' or 'no':

What did John give to Susan?
Who did John give a book to?
Why did John give Susan a book?

In English, these 'wh-questions' always start with a word, called a 'wh-word', that indicates what the question is about; if I'm asking you these questions, the wh-words are like the blanks that I'm asking you to fill in for me. Now, it's possible for a question to have more than one wh-word in it:

What did John give to whom?
Who gave what to whom?

And as you can see, in English, a wh-question with more than one wh-word in it (sometimes called a multiple wh-question) must begin with one of the wh-words.

In Bulgarian multiple wh-questions, by contrast, all of the wh-words must be at the beginning of the sentence. Those last two English sentences, for example, would have these translations in Bulgarian:

Kakvo na kogo e dal Ivan?
what to whom gave Ivan
'What did Ivan give to whom?'

Koj kakvo na kogo e dal?
Who what to whom gave
'Who gave what to whom?'

Now, imagine that you're a child learning Bulgarian as your first language, and you hear the first of these Bulgarian sentences. What theory of Bulgarian wh-question formation would you come up with?

As it turns out, there are infinitely many possible theories. Here are a few:

If you're a Bulgarian child, there is some number n such that n is the greatest number of wh-words you ever heard anybody put in a question, as you were growing up and trying to learn the language. And whatever n is, there are infinitely many theories that are compatible with all the data you've heard; the right theory could be "Put as many wh-words at the beginning of the sentence as you can, up to a maximum of n", or "...up to a maximum of n+1", etc.

In other words, just as in the Tagalog case, the Bulgarian data underdetermine the analysis. So we might expect the Bulgarian speakers to act just like the Tagalog speakers; we might find that different Bulgarians have different theories of how wh-questions are constructed in their language. We'd test that by asking Bulgarians about how to ask wh-questions with more and more wh-words in them, and we'd discover Bulgarians with the "up to 2" grammar, Bulgarians with the "up to 3" grammar, Bulgarians with the "front all of them" grammar, etc.

But in fact, that's not what we find. Bulgarians, unlike Tagalogs, all converge on the same grammar; they all put all their wh-words at the front of the sentence.

Now, this is presumably not really a difference between Bulgarians and Tagalogs; it's just that they're facing different linguistic puzzles. Here's the hypothesis that many syntacticians are working to test:

Human minds are structured in such a way that they can only learn some logically possible grammars, and not others.

By this hypothesis, the Bulgarian child who has to choose among the theories in the list above has a certain advantage; she doesn't actually have to consider all of these grammars. Only the first of these grammars is a possible grammar of a human language; her brain is designed to learn that type of grammar. In other words, as soon as a Bulgarian child hears a wh-question with multiple fronted wh-words, she knows that the rule for her language is "put all the wh-words at the beginning of the sentence"; she doesn't even have to consider any of the other logical possibilities.

To put it yet another way, we are born knowing certain things about how languages are put together, and this (tacit) knowledge limits the grammars that we have to consider. This hypothesis is sometimes called innateness, and the content of this linguistic knowledge with which all normal human beings are born is called Universal Grammar (or UG for short). The argument just given from Bulgarian is one type of argument from the poverty of the stimulus; this refers to arguments that the information available to children is insufficient to allow them to all converge on the grammar they are attempting to learn.

Optional Reading: for a critical look at poverty of the stimulus arguments, here's a paper by Pullum and Scholz; for a response, here's another paper, by Legate and Yang.

Here are a few more poverty-of-the-stimulus arguments, all based on English. These have a slightly different form than the Bulgarian one; they are arguments of the form "here is something all English speakers know, without having been told, and the data that would have pointed them to the right conclusion would probably have been vanishingly small in the input they were exposed to as children." For versions of this argument with actual numbers, see the optional papers above.

Consider the formation of yes-no questions; here we will confine ourselves to clauses containing the word is. Yes-no questions seem to be formed by moving is to the beginning of the sentence:

The man is sick.
Is the man sick?

Thus, we might formulate the following rule for forming yes-no questions in English (well, when is is present):

However, this rule isn't specific enough. Consider a statement like the one below; here there are two instances of is, but we can't just freely choose which one to move to the beginning:

The man is claiming that Syntax is lucrative.
Is the man claiming that Syntax is lucrative?
*Is the man is claiming that Syntax lucrative?

(the * on the last sentence indicates that this sentence is ungrammatical). So apparently we need to make our rule for question formation more specific, so that we can correctly form only the first of the two questions above. Possible rules that would get the right result include:

Now, in fact, only the last of these theories (version three) will get the right result for all the English data; consider data like the ones below, for example:

The man who is talking is bearded.
Is the man who is talking bearded?
*Is the man who talking is bearded?

Now, there are two points to make about data like these. First, they're vanishingly rare in the speech that's directed at children (see the optional paper by Legate and Yang, in particular, for some good numbers about this). So you might have expected some people to accidentally converge on the "version two" theory instead, or at least for some children to go through a stage in which they entertain that theory. But that never happens.

Moreover, if we consider Version Two and Version Three, Version Two is clearly the simpler theory. It would be a lot easier to program a computer, for example, to obey Version Two than to obey Version Three; all you need to be able to do is recognize 'is' when you hear it, and keep track of the order of the words. And if you consider the type of facts about utterances that you know when you're first learning a language, what you know is just the kind of thing you'd need to implement Version Two; you know what order the words are in. At the beginning, at least, you have no idea where the clause boundaries are. And yet, everybody ignores the information that's easy to get, and goes straight for a rule that crucially depends on having figured out the structure of the sentences. This is weird and perverse, and yet, everyone does it. So, here's the hypothesis: everyone is born knowing that rules like the Version Two rule are non-starters, and that only the Version Three rule could possibly be a rule for a language. In other words, part of Universal Grammar is the knowledge that rules like these are structure-dependent; they make reference to notions like "main clause", not to notions like "first" or "second".

Here are a few more subtle facts about English that all speakers know, without knowing that they know it and without ever having been explicitly told:

We all agreed that the following sentence is ambiguous:

Mary kept the car in the garage.

In particular, this sentence can mean either of the following:

Mary retained ownership of the car which was in the garage.
Mary used the garage to store the car.

We also agreed that the ambiguity vanishes in questions:

Which car did Mary keep in the garage?

Here we can only get the "store" reading, not the "retain ownership" reading.

One more subtle fact of the same kind:

We all agreed that in the following sentence, them cannot refer to the men:

Will the men expect to see them?

On the other hand, if we add who to the beginning of the sentence, it becomes possible for them to refer to the men:

Who will the men expect to see them?

Again, if you're a native English speaker, no one's ever taught you this, but you know it anyway.

Just to wrap this part up, then: we've seen that in a number of areas, despite the lack of relevant data, everybody converges on the same answers to questions about how the grammar ought to work, even though there are many imaginable answers that nobody chooses. The hypothesis that many syntacticians pursue is that we are structured to only consider some of the logically possible hypotheses about how our languages work. This innate knowledge is what's known as Universal Grammar (which, it should be clear by now, is sort of an unfortunate term; it doesn't refer to the grammar of a particular language, but rather to the conditions on possible language that all languages obey).

So, the question is, what's the content of Universal Grammar? This is what we'll spend the semester studying.

Introduction, part 2: Distinguishing between Syntax and That Which Is Not Syntax