Paradigm leveling has long been considered a sporadic and irregular type of language
change that can be described by general tendencies (Kurylowicz 1947, Manczak 1958,
Hock 1991), but can not be predicted or quantified. Explicit constraints on Paradigm
Uniformity or Uniform Exponence in OT (Kenstowicz 1995, Steriade 1996, Kenstowicz
1999) have brought phonological theory one step closer to a formal description of
paradigm leveling, but do little to attack the basic problem of predicting when leveling
will occur, which alternations will be leveled, and in which direction. Consider,
for example, the leveling in pre-classical Latin of s~r alternations
in noun paradigms:
(1) Latin hono:s > honor 'honor'
case
pre-leveling
post-leveling
nom.
[hono:s]
[honor]
gen.
[hono:ris]
[hono:ris]
dat.
[hono:ri:]
[hono:ri:]
acc.
[hono:rem]
[hono:rem]
abl.
[hono:re]
[hono:re]
Kenstowicz (1995) proposes that a Uniform Exponence constraint for nouns, together
with a markedness constraint against intervocalic [s], pushed Latin towards uniform
paradigms with [r] throughout. The change was somewhat more complicated than this,
however, and a uniform exponence analysis leaves many questions unanswered. First,
leveling was restricted primarily to masculine and feminine polysyllabic nouns (Hock
1991; Kiparsky 1997); however, it is not obvious why uniform exponence should have
been limited in this way. In addition, the older s~r alternation was
replaced by a vowel length alternation ([o] ~ [o:]) (Kiparsky 1997; Hale, Kissock
and Reiss 1998; Baldi 1999), and morphophonemic alternations in many other words
were left intact (e.g., [ars] ~ [artis] 'art', [urps] ~ [urbis]
'city'). Thus, if [hono:s] changed to [honor] to satisfy a uniform exponence
constraint, this constraint must have cared specifically about uniformity of [s]'s
and [r]'s, and not vowel length, stop voicing, or the presence or absence of
[t]. Finally, and most importantly, the uniform exponence analysis does not explain
why the change took place in the way it did. Why did a uniform exponence constraint
suddenly get promoted to outrank markedness and IO-faithfulness constraints? What
allowed the nominative to get rebuilt in Latin, contrary to the usual direction of
leveling? And why do languages sometimes move in the opposite direction, extending
alternations and making paradigms less uniform?
In this paper, I suggest that paradigm leveling is motivated by more than simply
a Humboldtian preference for non-alternation within the phonology. I propose instead
that leveling has its roots in the way that paradigms are projected morphologically,
and results when non-alternation is the strongly dominant ("default") morphological
pattern in the language. I present an computationally implemented model of morphological
acquisition that chooses base forms and learns to project paradigms by means of stochastic
morphological rules. The base selection mechanism allows us to make predictions about
which forms in the paradigm should serve as the pivot for analogical changes. The
model also provides estimates of the strength of different patterns in the language,
allowing us to make quantitative predictions about which words should come under
pressure to change, and which morphological patterns should be extended.
The morphological acquisition model employed here proceeds in two stages: first,
a base is chosen by evaluating each slot in the paradigm for how "effective" it is in predicting the remainder of the paradigm. A form is considered an effective
base when it allows the remaining forms to be derived confidently and accurately,
and when the resulting grammar does not generate large numbers of competing possibilities.
Conversely, a form is a bad base when it leaves open many possibilities for derived
forms, or when its predictions are wrong or uncertain. For example, in the hypothetical
language in (2), it is difficult to predict the genitive form given the nominative:
(2) Neutralization in the nominative
nom.
gen.
[gluptus]
~
[glupti:]
[nokus]
~
[noki:]
[reptus]
~
[reptoris]
[kortus]
~
[kortoris]
In this language, the novel word [tulpus] has a 50% chance of having the genitive
form [tulpi], and a 50% chance of [tulporis]. On the other hand, it is easy to predict
the nominative given the genitive: new genitives [tulpi] and [pulkoris] could only
have nominatives [tulpus] and [pulkus], respectively, with 100% certainty. In general,
slots in the paradigm that suffer from neutralizations make bad bases, because one
must guess about unpredictable information in order to project the remainder of the
paradigm. In the case of Latin, the nominative suffers from more neutralizations
than any other form in the paradigm, and is the worst choice of base according to
several different criteria (avg. score = 1 out of 6). Among the oblique forms, the
ablative is overall the best choice of base (avg. score = 5.4) , following by the
genitive (avg. score = 4.6).
Once a base has been identified, the model constructs a grammar of stochastic rules
to produce the rest of the paradigm, using the minimal generalization algorithm developed
by Albright and Hayes (1998). This algorithm explores the lexicon, keeping track
of which morphological patterns are statistically most reliable in different phonological
environments. The model was trained on a database of 1,700 Latin nouns in their pre-leveling
forms, and was then used to produce nominative forms for all nouns with [-o(:)r]
in the genitive. Among the outputs, three qualitatively distinct patterns could be
distinguished: for agentives (all non-neuter, polysyllabic, with stem-final [t] or
[s]) -or nominatives were favored correctly over -o:s nominatives (.98
vs. .23). For other non-neuter polysyllables, such as [honoris], -or was somewhat
favored over -o:s (.75 vs. .60), despite the presence of -o:s forms
in the training set. In other words, the model predicts leveling for this particular
class of words. Finally, -o:s remained favored to a moderate extent for polysyllabic
neuters (.66 vs .39), and strongly for monosyllables (.71 vs. .132).
A surprising aspect of this model is that it predicts paradigm leveling without any
explicit notion of paradigm uniformity to motivate the change. The preference for
[r] forms comes solely from the fact that non-alternation was the dominant pattern
in the Latin lexicon, and words like [hono:s] were brought into line with the rest
of the vocabulary. An important question for this model, then, is why there should
be a universal tendency for paradigms to become more uniform, if such changes are
driven only by lexical statistics. I suggest that the reason for this is that morphophonemic
alternations generally affect only a small subset of the phonemes in a language;
therefore, even before any leveling takes place, paradigmatic alternations tend to
affect only a minority of lexical items. A prediction of this model, however, is
that if an alternation does happen to affect the majority of lexical items, then
the alternation should be extended just as easily as uniformity is extended in other
cases. In fact, this prediction does seem to be true; as an example, I offer evidence
from an analogical change that is currently underway in Korean, in which a [t¬]
~ [s] alternation is being generalized to words with less radical alternations (such
as [t˺] ~ [t], [t˺] ~ [tʰ] and [t˺] ~ [tʃʰ]).
N.B. The IPA symbols at the end of the last paragraph use Unicode! If you do not
see a schwa here: [ə], your browser is not configured correctly to use a unicode
font that can display IPA characters. For more info, click here.