Developmental Changes in Early Child Lexicon in Mandarin Chinese

1. Introduction

Uncovering universal principles versus language-specific patterns has been an important goal in language acquisition research. In the last several decades, researchers have been concerned with whether children learning different languages would have the same kinds of first words in their early productive vocabulary. A controversial issue in this regard is whether children display the same developmental profile across different languages or show different developmental patterns shaped by the specific properties of their target language. In this study, we address this issue by undertaking a large-scale study of Mandarin Chinese-speaking children, specifically to answer the following question: (i) Does lexical development in early child Mandarin follow a universal pattern? In this regard we will examine the age of acquisition of lexical categories, with particular focus on the ‘noun bias’ versus ‘verb bias’ debate in the literature. (ii) What underlying variables, including conceptual and linguistic properties of words, might impact the age at which lexical items are acquired in early child Mandarin? To address this question we will examine factors such as the conceptual properties of words, word frequency, and word length, that may modulate the observed empirical patterns in early lexical development.

1.1 Universal vs. language-specific pattern in the development of lexical categories

One well-studied domain with regard to a universal vs. language-specific pattern in the development of lexical categories is the relative order of acquisition of nouns and verbs. A number of studies have indicated that the noun bias, a predominance of nominal items, including common nouns and proper nouns, characterize children’s early productive speech in many languages, and predicate items such as verbs come in quantity only afterwards (Gentner, 1982; Gentner& Boroditsky, 2001; Gillette, Gleitman, Gleitman & Lederer, 1999; Gleitman& Gleitman, 1994; Golinkoff, Mervis & Hirsh-Pasek, 1994; Markman, 1987; Waxman & Booth, 2003). The predominance of nouns is a universal pattern, according to these researchers, because nouns are conceptually and perceptually more accessible than other lexical categories in languages. Nouns referring to people and objects are also more stable across languages, while words representing actions or relations exhibit more variability and language-specific characteristics in the way in which actions and relations are encoded.

More recent studies, however, have challenged this view, with cross-linguistic data, particularly from Asian languages. For example, Choi and Gopnik (1995) and Tardif (1996) have argued that the noun bias is weak or non-existent in languages such as Mandarin Chinese and Korean. Choi and Gopnik (1995) compared Koreanspeaking and English-speaking children’s productive vocabulary from a mean age of 1;2 to 1;10, and found that Korean children produced equal proportions of nouns and verbs when their vocabulary reached about 50 words, while Englishspeaking children produced more nouns than verbs. Furthermore, Tardif (1996) and Tardif, Shatz and Naigles (1997) have shown that Mandarin-speaking children produced more verbs than nouns, as compared with English-speaking children. These researchers argued against the universal pattern of noun bias and pointed out that there are language-specific differences not only in the encoding of nouns and verbs across languages, and also in the way early lexical items appear in childdirected parental speech. They proposed that some languages, such as Chinese, may be more verb-friendly, in that ellipsis of subjects or objects or both is allowed, making verbs more salient in the input language. Additionally, verbs in these languages may have higher imageability than nouns (Ma, Golinkoff, Hirsh-Pasek, McDonough & Tardif, 2009), making them more accessible to children.

One line of evidence against a universal noun bias has also come from studies on the very earliest stages of lexical development, when children have acquired less than 50 words. Bornstein and Cote (2004) compared lexical development in children from seven different languages and found that children showed noun advantage across vocabulary levels only after 50 words. Bates and colleagues(1994) had similar findings. They showed that routines or social words, which are not necessarily always nouns (e.g. bye-bye), appear earlier than common nouns in children’s first words (Bloom, Tinker & Margulis, 1993; Caselli et al., 1995). For example, Caselli et al. (1995) analyzed the vocabulary composition of Englishand Italian-speaking children aged from 0;8 to 1;4 using the MacArthur-Bates Communicative Developmental Inventory (CDI) (Fenson et al., 1993). The results showed that the proportion of social words combined with words for people, onomatopoeic words, and routines was higher (over 60%) than that of common nouns (roughly 29%) within the first 50 words, regardless of languages. Kauschke and Hofmeister (2002) also found that although nouns were among the earliest vocabulary items in German-speaking children, the percentage of nouns was rather small (5.83%) and limited to persons and basic-level objects. Finally, Tardif, Fletcher, Liang, Zhang, Kaciroti and Marchman (2008) compared the vocabulary composition of children speaking English, Mandarin Chinese, and Cantonese, using the CDI and its Mandarin and Cantonese versions. Their results showed striking similarities in children’s first 10 words, which were predominantly words for people in all three languages. Sound-effect words were the second largest category children could produce, though it was quite small in Mandarin (8.7%) compared to person terms (77.7%). The major differences existed in the relative percentage of common nouns and verbs: English-speaking children produced more common nouns (19.4%) than verbs (0.7%), while Mandarin-speaking children produced more verbs (7%) than common nouns (3.2%), and Cantonese-speaking children produced roughly equal numbers of verbs (4.8%) and nouns (5.7%).

Although the above studies all reported that the earliest acquired words were not necessary object names, the specific categories reported as the earliest vocabulary were different. In Kauschke and Hofmeister’s (2002) study, relational words (e.g. up, again) and personal-social words (e.g. hi, no) were the dominant categories (over 80%) within the earliest sampling points (age 1;1 and 1;3). In the studies of Caselli et al. (1995) and Tardif et al. (2008), words referring to people and sound effects were the two largest categories for English-speaking children, and in Tardif et al. (2008) words for people were the largest word category for Mandarin-speaking children. Other than these discrepancies, different criteria were used to count the word categories in different studies; for example, nouns not only included concrete nouns and abstract nouns, but also words for people, in Kauschke and Hofmeister (2002), and words referring to outside places were excluded from common nouns in Caselli et al. (1995), but were included in Tardif et al. (2008).

Additional mixed findings were also obtained from several studies that compared Asian languages with Western languages. For example, although Choi and Gopnik (1995) found equal proportions of nouns and verbs in the early child Korean lexicon, Kim, McGregor and Thompson (2000) failed to replicate these finding using similar data-collection method-a combination of maternal diaries and checklists. Kim et al. found that both English-speaking and Korean-speaking children produced more nouns than verbs when their vocabulary was close to the 50-word mark, and children from both languages acquired roughly the same number of nouns, except that the Korean-speaking children produced significantly more verbs than their English-speaking peers. Au, Dapretto and Song (1994) also reported that nouns outnumbered verbs in Korean-speaking children’s early vocabulary, and there was no significant difference between the Korean-speaking and the English-speaking children in their study. Tardif and colleagues reported that Chinese-speaking children produced more verbs than nouns when measured by spontaneous speech in their earlier studies (Tardif, 1996; Tardif et al., 1997), but in their later studies, nouns and verbs were found to be more evenly distributed in Chinese-speaking children’s earliest vocabulary (Tardif, Gelman & Xu, 1999). A clear noun bias also occurred after children acquired more than 20 words when measured by an adapted Mandarin CDI, although this bias was not as strong as that seen with English-speaking children at a comparable age or vocabulary level(Tardif, 2006; Tardif et al., 1999). Finally, Liu, Zhao and Li (2008) conducted a corpus-based study comparing English, Mandarin, and Cantonese corpora in the CHILDES database (MacWhinney, 2000), and their analyses showed an even distribution of nouns and verbs in Mandarin and Cantonese, but with increasing diversity and complexity as a function of age in all three languages examined.

In sum, while a significant amount of research has been devoted to the study of early lexical development in different languages, inconsistencies and controversies with regard to detailed developmental profiles remain, especially concerning what the predominant lexical categories are in children’s early vocabulary, how vocabulary composition changes over time, and how differences might occur as a function of language-specific or culture-specific properties. In this study, we aim at providing some insights into these questions by examining Mandarin-speaking children’s productive vocabulary across ages 1;0 to 2;6, with a larger sample size (928 children), and on a more fine-grained level using our revised Early Vocabulary Inventory for Mandarin Chinese (Hao, Shu, Xing & Li, 2008). This parental report method was modeled closely after the original CDI that has been proven a valid and powerful tool to assess children’s early vocabulary(Dale, 1991; Dale, Bates, Reznick & Morisset, 1989). A more comprehensive knowledge is needed for understanding Mandarin-speaking children’s vocabulary acquisition compared to that for English-speaking children. To that end, our study emulates the study of Bates et al. (1994) in an attempt to chart a detailed developmental profile of vocabulary growth for different lexical categories at varying vocabulary levels. Given our literature review above, to address the first issue posed at the beginning regarding whether Chinese-speaking children follow a universal pattern in lexical development, we have to further ask three specific questions: (1) What lexical category or categories will first appear in Mandarinspeaking children’s early vocabulary? (2) Do nouns dominate the vocabulary, and if so at which specific vocabulary stage of development? (3) How do other lexical categories in Chinese-speaking children’s productive vocabulary compare with those in the productive vocabulary of English-speaking children at different developmental stages?

1.2 Variables underlying early lexical development

Another goal of this study, posed as the second question at the beginning, is to identify the underlying variables that modulate the age of acquisition (AoA) of early vocabulary, in addition to our efforts to describe the developmental trajectories. In the extant literature, two major types of factor have been suggested as playing significant roles: conceptual variables such as the imageability of object-referring nouns as compared with action-referring verbs, and linguistic, input, variables such as the frequency and length of words in the child’s early vocabulary.

Early studies advocating the noun bias have explored the conceptual variables in particular, and suggested that nouns are acquired early because they are usually conceptually and perceptually more accessible than verbs (Gentner, 1982; Gentner & Boroditsky, 2001; Gillette et al., 1999; Gleitman & Gleitman, 1994). More recent studies have further attempted to demonstrate the role of specific conceptual variables underlying vocabulary development. In particular, the SICI theory (Maguire, Hirsh-Pasek & Golinkoff, 2006) highlights the important roles of four variables of objects in the child’s learning environment: Shape, Individuation, Concreteness and Imageability (SICI). ‘Shape’ refers to a persistent, tangible object contour for an object and overall configuration of the action.‘Individuation’ refers to the ease with which the referent can be distinguished from the surroundings of the environment. For example, the referent of the noun cat can be easily observed in the world, while the referent of the noun idea can not be observed. ‘Concreteness’ refers to the degree to which the object encoded by the word is manipulable (e.g. the learner can see, hear, and touch the object), while‘imageability’ refers to the ease with which a word can arouse the mental image.

According to the SICI theory, the above four factors commonly determine the ease with which a novel word is learned, regardless of grammatical category. These factors are distributed on a continuum spanning word classes, and they also allow for overlap between different grammatical classes. Common nouns label objects that are located at the more concrete, easily individuated, end of the continuum with higher imageability and consistent shapes, whereas verbs fall at the less concrete, difficult to individuate, end with lower imageability and variable shape. Such differences between common nouns and verbs may explain why nouns tend to be learned earlier than verbs in general. Thus, rather than just looking at the relative order of acquisition of nouns and verbs as a whole, the SICI theory examines the differences of nominal and verbal categories along a continuum.

It should be noted that although the SICI theory highlights four different variables, as the acronym suggests, the proponents of the theory did not intend the variables to be taken at their face value (see Maguire et al., 2006, p.375). Rather, the authors suggest that SICI is intended to include many factors that scale the difficulty of learning a particular word. In addition, the literature so far is unclear about the relative or unique role that each of the four SICI variables plays in determining the AoA of early vocabulary. This is perhaps partly due to the fact that these four variables are highly correlated with each other, and it is not easy to isolate the unique contribution of each. To make matters worse, the literature has also used some of the terms loosely or interchangeably (e.g. ‘concreteness’has often been used interchangeably with ‘imageability’). Given this situation, we have not designed the current study as a test of the SICI theory, but rather, use the SICI theory as a starting point for thinking about the role of conceptual properties of words, as opposed to linguistic properties (e.g. frequency and length, discussed below). Nevertheless, we highlight and test one specific variable in the SICI theory, imageability, on the basis of the empirical evidence that this variable is a reliable predictor of AoA of lexical categories.

A number of studies have reported that imageability significantly impacts the acquisition and processing of nouns and verbs not only in adult ratings (Bird, Franklin & Howard, 2001), but also in children, based on AoA scores obtained from parental reports such as CDI (Ma et al., 2009; McDonough, Song, HirshPasek, Golinkoff & Lannon, 2011). For example, McDonough et al. (2011) selected 120 words (76 nouns and 44 verbs) that had both published AoA data and rated imageability data. Their hierarchical multiple regression analysis with AoA as an independent variable showed that both word grammatical category (noun or verb) and imageability had unique contributions, but that imageability was more powerful. Ma et al. (2009) asked adults to rate the imageability of nouns and verbs that appear in the Chinese and English CDIs. The results showed that Chinese verbs’ imageability value was higher than that of English verbs. Compared with English verbs, Chinese verbs that are acquired earlier by children had the properties of semantic specificity and higher imageability. Thus, imageability has been found to be related to the verb advantage for Chinese children.

In addition to the conceptual properties of words such as imageability, the current study also examines non-conceptual, linguistic, factors that are relevant to early lexical development, such as word length and word frequency. Zhao and Li’s (2008) computational simulations of vocabulary development in English and Chinese showed that the mean length of words (in phonemes) gradually increased as a function of vocabulary size in both languages, and the average phonemic length of Chinese verbs is not only shorter than that of Chinese nouns, but also shorter than that of English verbs. This word length advantage may make Chinese verbs easier to acquire than nouns for Chinese-learning children. Based on the findings of Zhao and Li (2008), we would expect that word length would also play a unique role for the AoA of words.

Input frequency may also contribute to the noun and verb disparity. Goodman, Dale and Li (2008) investigated the relationship between input frequency measured by child-directed speech and AoA of the CDI vocabulary in English. They found that within each grammatical category (nouns, verbs, adjectives, and closed-class words), the more frequent the word is in parental speech, the earlier the child can produce the word. Interestingly, the correlation between input frequency and AoA disappeared when words of all grammatical categories were calculated together(i.e. it plays a less important role across categories). Furthermore, McDonough et al. (2011) found a unique role of input frequency on the AoA of the English CDI vocabulary when common nouns and verbs were included, and Ma et al. (2009) similarly identified the impact of input frequency on AoA for both Chinese and English CDI words. Finally, in computational models Li, Zhao, and MacWhinney(2007) showed that both input frequency and word length significantly modulate lexical growth patterns for early English vocabulary, especially with regard to the shape of the vocabulary spurt. To what extent input frequency plays a unique role in the AoA of the early child Mandarin lexicon remains an open issue, one that we will examine in the current study alongside the effects of other conceptual and linguistic variables.

In what follows, we first discuss the methodology of the research and analyze the developmental trajectory of the early child Mandarin lexicon, in order to address the first question regarding universal vs. language-specific patterns raised at the beginning. We then test the relative influences of the aforementioned conceptual and linguistic variables on the AoA of early child Mandarin vocabulary, in order to address the second question regarding the underlying variables that govern developmental patterns.