Functional Categories and Features in a Chomskian Framework
Elly van Gelderen
Arizona State University
World Congress on Mulla Sadra, Tehran, 23-27 May 1999
In this paper, I provide some background to Chomskian Linguistics and especially to the notion of Universal Grammar (UG). I also quickly outline the historical development of phrase structure rules and transformations and show how the latest theory, Minimalism, fits with earlier work. In discussing Minimalism, I focus on the features, some of which trigger overt movement. In the last section, I argue that there is cross-linguistic variation both with respect to features and with respect to functional categories.
1 Philosophical background
Chomsky's work can be seen in terms of two problems he examines in e.g. Knowledge of Language: (a) how do we know so much on the basis of so little evidence, and (b) how do we know so little give that we have so much evidence? These are referred to as Plato's Problem and Orwell's Problem respectively. The first problem concerns what we know about language and how we acquire this knowledge. It will be dealt with in some detail in 1.1. The second problem concerns the use of language and the mechanism of indoctrination. It will just be mentioned in 1.2, not discussed.
1.1 Plato's problem
Plato's problem is that of the `poverty of the stimulus'. As speakers of a language we know so many rules without ever having been explicitly taught these. We can produce sentences that we have never heard before. The reason we know this much is because we have acquired a Grammar not on the basis of imitation but by using an innate Universal Grammar to acquire a grammar. This Universal Grammar (hence UG) helps to interpret the language we hear around us and to build up our unique grammar. This process is schematized somewhat simplistically in (1):
1. L1 + UG = G1 --> L2
A child hears a language (L1 in (1)), and principles and rules of UG enable him or her to build up a grammar (G1 in (1)). The output of this grammar is a language (L2) not necessarily the same as L1. In principle, each speaker can have a slightly different grammar from other people speaking the `same' language.
An example of a rule that is often given in this context is that of `structure preservation', i.e. languages have rules that take into account the (hierarchical) structure. For instance, the rule for making a Yes/No question is to shift the auxiliary and the subject, as between (2) and (3). Speakers will not take just any auxiliary but will take the structure into account. Thus, they will produce (3) but not (4):
2. The painting which was assumed to be by Vermeer was sold to The Getty.
3. Was the painting which was assumed to be by Vermeer sold to The Getty?
4. *Was the painting which assumed to be by Vermeer was sold to The Getty?
Below are some other examples. Sentences (5), (7), and (9) are well-formed in English; (6), (8), and (10) are not:
5. The student of English from the former Soviet Union is a nice person.
6. *The student from Iceland of English is a nice person.
7. The student from Iceland is poor. The one from Greenland is rich.
8. The student of English is poor. *The one of chemistry is rich.
9. The student of English and of Russian is called Peter.
10. *The student of English and from New York will be leaving soon.
How does an English speaker acquire knowledge about the order of the two Prepositional Phrases in (5) and (6) or about when to use one as in (7) and (8)? Suppose UG provides certain kinds of building blocks, phrases such as Noun Phrases and categories such as Nouns, as well as smaller units inside phrases namely intermediate phrase, here referred to as N's. This would mean that the structure for the NP in (5) would be as in (11) (cf. Baker 1978 and Hornstein & Lightfoot 1981). By having these categories, a learner could hypothesize the following rule: NPs are pronominalized by pronouns such as he, she or it; and N' is pronominalized by one. This would account for (7) and (8):
the N' PP
N PP from Russia
student of English
An NP has only one head N, but may have more than one N'. The complement of English is a sister to the head, and the modifier or adjunct from Russia is the sister to the N'. This accounts for the facts in (5) and (6). Since coordination is of `like'-elements such as adjuncts or complements (but not an adjunct and a complement), the structure in (11) predicts (9) and (10).
The VP consists of the same `building blocks', namely V, V' and VP, as in (12). This structure accounts for the data in (13) to (16) since complements are sisters to V and do so replaces a V' as in (15) but not a V as in (16):
V NP in Russia
13. I study English in Russia.
14. **I study in Russia English.
15. He did so in Russia.
16. *He did so English.
Thus, assuming that UG makes available certain categories and projections enables a learner to acquire a grammar with structures such as (11) and (12). These structures generate the grammatical sentences but not the ungrammatical ones.
1.2 Orwell's Problem
Orwell's problem is central to Chomsky's political work, i.e. how do we know so little give that we have so much evidence. How is it that we do not question certain immoral acts by our governments? The answer is that in a `democratic' society, consent has to be `manufactured'. Chomsky obviously thinks it is possible to resist this manufacture and indoctrination. The reason for this is presumably that we are not completely determined by our experience, we are not `blank slates'. Our task is "to understand the mechanisms and practices of indoctrination" (1986a: 286). Orwell's problem is relevant to linguistics, especially where linguistic relativity is concerned but this discussion lies beyond the scope of this article.
Having given one example of what linguistic knowledge constitutes (structures such as (11) and (12)) and how it is acquired (through UG), I now turn to a sketch of the current syntacic theory.
2 Minimalism: PF, LF and features
Before outlining Chomsky (1995), i.e. Minimalism, I briefly review older frameworks, so as to show that Minimalism is a natural development in generative theory.
2.1 Development of the Transformational Generative Framework
In the 1950s and 1960s, structures were produced through very language specific Phrase Structure Rules such as (17), adapted from Chomsky (1965: 106-7), and transformations such as (18):
17. S --> NP Predicate Phrase
Predicate Phrase --> AUX VP
AUX --> Tense (M) (Aspect)
NP --> (Det) N (S')
VP --> V NP* (PP)
18. Subject-AUX Inversion
NP - have - X
1 2 3
==> 2 1 3 (Optional)
Using (17) and (18), an English sentence such as (19) can be formed:
19. Will those people read that book?
Rules such as (17) and (18) are awkward and specific to English. Hence, much of the effort in making the formalism less language-specific and more universal was aimed at generalizing PS-rules (through X'-theory in the 1970s and 1980s) and reducing transformations to one (move-alpha in the 1970s and 1980s). Trees for (19) came to look like (20). As can be seen, the C(omplementizer) and I(nflection) are treated just like the NP or VP. All languages would have structures such as (20) but the headedness could vary. Thus, Farsi, Urdu and Japanese would be head-final and look like (21). Next, movement was simplified to `move anything anywhere'. Rather than specific structures such as (18), there were universal constraints on not moving `too far':
will NP I'
they I VP
t Spec V'
read det N
Thus, "[f]rom the origins of generative grammar, the fundamental operations were taken to be formation of the lexicon and recursive operations of two kinds that make use of lexical items: phrase structure and transformational rules" (Chomsky 1998b: 123). The Minimalist framework continues that. Phrase structure rules become `bare', i.e. no intermediate levels appear, and trees are built from `bottom-to-top'. Lexical items are combined by `merge' and moved if required. It also tackles the fundamental question of why elements move.
Some of the distinguishing characteristics of the Minimalism of the 1990s are (a) checking and features, (b) the role of the two interface levels, and (c) the mechanism of deriving a structure. I discuss each of these.
Regarding (a), in Chomsky (1992; 1995), lexical items are selected from the lexicon fully inflected. The head of a Functional Category such as I or C contains categorial and Case features and the NP and V `check' these features. If the categorial D-features are strong, the NP moves (or is attracted) into the Specifier position of the Functional Projection and the verb adjoins to the Head position. If the features are weak, movement occurs at LF. The features causing movement are abstract: strong does not mean that the element is overtly marked morphologically. Thus, movement occurs because of having to `pick' up features. If only strong features trigger overt movement, there is a possibility that Non-Interpretable features are not checked by LF. However, in Chomsky (1998ab), this is no longer a possibility and features can be attracted even if the lexical element does not itself move, through feature attraction. Feature-attraction is more economical and involves only head-movement of the features (Chomsky 1995: 271) and is formulated to "have the following property: an uninterpretable formal feature UFF in the extended lexical item ELI seeks the closest matching feature F in its c-command domain and attaches it to ELI, UFF then erasing if the match is successful" (1998b: 124). Thus, the modification from Chomsky's (1995) analysis is that it is not only strong features that must be checked before LF is reached, but all Non-Interpretable features since only Interpretable features are visible at LF. Hence, the strong/weak distinction can be eliminated. The evidence for this is (22). In (22), the expletive there does not check the Case features, since otherwise the Case features of the postverbal five javelinas would not be attracted. As a result, the Non-Interpretable Case features of the NP would remain unchecked and the sentence would not be well-formed:
22. There are five javelinas in our backyard.
As Chomsky (1995) notes, if the expletive were present to check the phi-features, the Interpretable plural phi-features of the noun would not be attracted to I(nflection) and again, (22) would not converge. Since (22) is grammatical, there is only inserted to check the Non-Interpretable categorial features. The problem now is to explain why the subject position in (22) must be lexically filled and why attracted D-features do not suffice in (23). Some stipulation for D-features must be made:
23. *__ are five javelinas in our backyard.
Regarding (b), for each linguistic expression, a grammar makes available two kinds of information, phonetic and semantic, or a Phonetic Form (PF) and an LF, using older terminology. The PF representation gives information to the Articulatory-Perceptual system and the LF one to the Conceptual-Intentional system. `Legibility' must be ensured at these `interface' levels (Chomsky 1998b: 119). Features are therefore divided as to whether they are phonetic, i.e. not allowed at LF, or semantic, i.e. not allowed at PF. Thus, a derivation splits in two parts. There are, however, features in language that are neither phonetic nor semantic, thereby violating legibility. These features are `Non-Interpretable' and do not "enter into interpretation at LF" (Chomsky 1995: 277). The reason Non-Interpretable features exist is "to force movement, sometimes overtly" (p. 278) to a higher Functional Category.
In the generative framework, movement has always been seen as problematic. As Chomsky (1998a: 42) puts it, "[w]hy language should have this [movement] property is an interesting question, which has been discussed for almost 40 years without resolution". Verbal agreement and Case are other problems since they are not relevant to the interpretation in Modern English. Chomsky (1998a: 42-8) proposes to connect both: the `offending' Non-Interpretable Case and agreement are eliminated through movement.
Thus, Non-Interpretable features trigger movement but Interpretable ones do not. Interpretable features are relevant at LF and do not erase or delete but can be `used over'. Non-Interpretable features take care of several phenomena earlier treated as separate, for instance, (a) an NP has one and only one Non-Interpretable Case feature, as (24) shows, and (b) they justify the inclusion of functional categories in the numeration and the ensuing movement into the heads and specifiers of these projections. In (24), Zoya cannot check the Case in both subject positions:
24. *Zoya seemed t was annoyed with Amir.
According to Chomsky (1995: 283), the person and number, i.e. phi-, features of Nouns are Interpretable because they can be reused. The example given by Chomsky is (25) where John moves to subject of the IP to check its Case, checking phi-features along the way. AGR is a functional projection in which checking takes place:
25. John is [t AGR [t intelligent]].
Regarding (c), I provide a sample derivation here. As mentioned above, trees such as (20) are no longer assumed in Chomsky (1995). Lexical items are taken out of the lexicon and merged as in (26). Through merge and move, the I(nflection) would be added as in (27) and because of the Non-Interpretable Case features of I, the NP would move, as indicated in (27) as well:
Section 2 sketches some of the properties of the generative framework, both pre-Minimalist and Minimalist. I now address some controversies.
3 UG and cross-linguistic variation: some controversies
Having outlined some basic aspects of Minimalism, I now turn to the question of how languages differ. In 3.1, I argue that the interpretability of features varies across languages (cf. van Gelderen 1999) and that, as a result, the occurrence of Functional Categories (hence FC) also varies (cf. van Gelderen 1993). The latter is shown in 3.2.
3.1 Variation in the Interpretablity of Features
As mentioned, according to Chomsky (1995: 283), the phi-features of Nouns are Interpretable because they can be reused. The example given by Chomsky is (25) above, repeated here as (28), where John moves to subject of the IP to check its Case, checking phi-features along the way:
28. John is [t AGR [t intelligent]].
In (28), there is no agreement between intelligent and John, and hence AGR may not have been activated. Alternatively, the movement to Spec AGRP may have to do with D-features in AGR that must be checked. There is no empirical evidence that the phi-features are Interpretable. In languages other than English, there is such evidence since the number and gender features appear twice, both on the verb and on the adjective or past participle. An instance is French where the number features in (29) appear on both finite verb sont `are-3P' and past participle parties `left-FP'. The person features are only marked on the finite verb and the gender features only on the past participle. Hence, person and number may be Non-Interpretable:
29. Les femmes sont parties
The women are-3P left-FP
`The women have left'.
It is interesting that in languages that have (29), it is always number and gender, never person, that appear on the past participle. In addition to (29), in Spanish, the passive participle as in (30), inflects for number and gender, but not person; and in Swedish, number is marked on the past participle in (31) (there is no gender in Swedish and finite verbs show no inflection). The data in (29) to (31) might indicate that person is not Interpretable and cannot be checked twice:
30. Las casas son vendidas
the houses are-3P sold-FP
31. Tre bilder blev målade
three pictures were painted-P
With object agreement, as in (32) from Tohono O'odham, person features do occur on the participle. Here, person appears as well as number, and so, there is nothing against person marking on participles. It just does not seem to be the case that person is `re-used', i.e. Interpretable:
32. Ceoj 'o 'añi: ñ-ceggia,
boy is/was me 1S-fighting
`The boy is/was fighting me'. (Zepeda 1983)
I therefore argue, contra Chomsky (1995), that person features in a number of languages (including Modern English) are Non-Interpretable and are checked only once. Number and gender can be `re-used' as in (29) to (31) above.
There is some dialectal evidence that the features of pronouns are checked differently from those of full NPs, namely from Belfast English. Henry (1995: 16) describes Hiberno English constructions as in (33) and (34) where the number features of the full noun in (33) are not checked but the ones of the pronoun in (34) are:
33. The eggs are/is cracked
34. *They is cracked.
In standard English, the phi-features of both pronouns and full nouns must be checked before LF, again an indication that person might be Non-Interpretable.
In 2.2, the Case discussed is grammatical or structural Case, dependent on the nominal's position in the sentence. There is another kind of Case, namely inherent Case, dependent on the thematic structure. In Chomsky (1986a: 193), this is defined as: "[w]e distinguish the `structural Cases' objective and nominative, assigned in terms of S-structure position, from the `inherent Cases' assigned at D-structure. ... Inherent Case is associated with [theta]-marking, while structural Case is not". Inherent Case is relevant at LF. For structural Case, there is a one-to-one relationship between Cases and nominal elements (i.e. its Non-Interpretable status). Belletti (1988) and Mahajan (1990) assume that inherent Case is optionally assigned/checked. The nominal, when it does not have inherent Case, may check its structural Case if this is available. Thus, in many languages, nominals have either structural Case or inherent Case. The structural Case features are Non-Interpretable but the inherent ones are not. The former make it necessary for a lexical element to move to a FC; the latter do not.
In Old English, there is evidence for more inherent Case than there is in Modern English (cf. van Gelderen 1996). This would mean Case features are Interpretable and relevant at LF. The evidence is that Case (genitive, dative, or accusative) depends on the thematic structure of the verb, and that Case marked NPs do not occur in structurally Case marked positions. Thus, Old English did not have a passive as in Modern English where the `subject' received nominative Case, but had constructions such as (35) with inherent him rather than he:
35. Beowulf 1269
þær him aglæca ætgræpe wearð
`here he was grabbed by the monster'.
There is an interesting person split: first and second person pronouns lost the inherent Case before third person pronouns. This is obvious from the morphology but also from person splits in the passive and ergative: constructions where the third person pronoun has inherent Case, such as him in (36), outnumber those where a first or second person does: for instance, in the first 6000 lines of Layamon (a thirteenth-century text), there are 137 instances of impersonal him, as in (36) out of a total of 534 occurrences of him (=26%), whereas there are 26 instances of impersonal me out of a total of 194 (=13.4%):
36. Caligula 4
sel þar him þuhte (same in Otho)
splendid there he thought
`Splendid it seemed to him'.
The third person features, however, were Non-Interpretable before first and second person (cf. van Gelderen 1999).
In conclusion to 3.1, I assume that linguistic expressions have a phonetic and a semantic component. In the `ideal case', all features would be either relevant at LF or PF. This is, however, not true since there are features that force movement that are neither semantic not phonetic. These are the Non-Interpretable Case and agreement features. They force movement but are not relevant to the interpretation. Above, I argue that languages and stages differ as to which features are Interpretable. In Modern English, Case features and the person and number features of verbs are Non-Interpretable but, I argue, there is no direct evidence (cf. (28) versus (29)) that all nominal phi-features are Interpretable. In other languages, number and gender features on nominals are Interpretable, but not person. Structural Case features are Non-Interpretable in Modern English but are not in Old English, as I turn to in 3.2. Thus, the status of features ultimately accounts for differences in word order, Case and agreement across languages, and for whether a language is synthetic or analytic.
3.2 Variation in Functional Categories
If, as I argue in 3.1, features vary as to Interpretablity, it may be the case that the FCs (where the checking of Non-Interpretable features also occurs) that are activated in a particular language vary also. In van Gelderen (1993), it is argued that in older English the IP node need not be activated since modals are main verbs and infinitival to is part of the VP. In `older' languages, aspect is more important than tense and aspect is typically part of the VP (cf. ...) as opposed to tense which is part of the IP.
Thus, in Old English, to was part of the VP but, due to the increase in other auxiliaries such as shall, may and do, the infinitival marker is reanalyzed (by the language learner) as being in I. The Old English tree is given in (37) and the late Middle English one is in (38):
Evidence for (38) is the occurrence, at the end of the 14th century, of split infinitives as in (39) and (40), pro-infinitives as in (41), ACI and `dummy' do as in (42):
39. Wyclif, Matthew 5, 34
Y say to 3ou, to nat swere on al manere,
`I say to you to not curse in all ways'.
40. Apology for the Lollards 57
Poul seiþ, þu þat prechist to not steyl, stelist,
`Paul says, you that preach to not steal steals'.
41. Handlyng Synne 8023-4
But wyle 3e alle foure do
A þyng þat y preye 3ow to,
`But will all four of you do a thing that I ask you to'.
42. Chaucer, The Monk's Tale 441-2
His yonge sone, that three yeer was of age
Un-to him seyde, fader, why do ye wepe?
Once I is introduced, there is evidence that the verb moves to it to check its features, whereas it did this in C in Old English.
In short, in section 3, I argue that variation among languages occurs in the Interpretability of features and the activation of Functional Categories.
After giving some background to UG and Minimalism, I argue that both Functional Categories and Features vary cross-linguistically. In Old English, not all Functional Categories are activated and amny features are Interpretable. THis changes. In Modern English, there are more Functional Categories and many features are Non-Interpretable.
Baker, C.L. 1978. Introduction to Generative-Transformational Syntx. Prentice Hall.
Belletti, Adriana 1988. "The Case of Unaccusatives". Linguistic Inquiry 19.1: 1-34.
Chomsky, Noam 1965. Aspects of the Theory of Syntax. Cambridge: MIT Press [1976, 11th printing]
- 1986a. Knowledge of Language. New York: Praeger.
- 1986b. Barriers. Cambridge: MIT Press.
- 1992. "A Minimalist Program for Linguistic Theory". MIT Occasional Papers in Linguistics 1. also appear as chapter 3 in Chomsky (1995).
- 1995. "Categories and Transformations", chapter 4 in The Minimalist Program, Cambridge: MIT Press.
- 1998a. Nuestro Conocimiento del Lenguaje Humano, Edicion bilingue. Santiago de Chile: Impresos Universitaria.
- 1998b. "Some Observations on Economy in Generative Grammar". Is the Best Good Enough? ed. by Pilar Barbosa, Danny Fox et al, 115-127. Cambridge: MIT Press.
Gelderen, Elly van 1993. The Rise of Functional Categories. Amsterdam: John Benjamins.
- 1996. "Case to the Object in the History of English", Linguistic Analysis 26: 117-133.
- 1999. A History of Reflexive Pronouns: where Case meets Agreement and Pro. MS.
Henry, Alison 1995. Belfast English and Standard English. Oxford: OUP.
Hornstein, Norbert & David Lightfoot 1981. "Introduction", in Explanation in Linguistics, edited by Norbert Hornstein & David Lightfoot. Longman.
Mahajan, Anoop 1990. The A/A Bar Distinction and Movement Theory. MIT PhD.
Zepeda, Ofelia 1983. A Papago Grammar, Tucson: University of Arizona Press.