Beyond Concordance Lines: Using Concordance to Investigate Language Development
In this article it states the language corpus has commonly been used as the basis of dictionaries and teaching materials whereby it is enhanced by the availability of concordance software such as Wordsmith, MonoConc Pro and Microconcord. Teachers, researches and even language learners typically examine concordance lines to discover how words and grammatical constructions are used. New software such as RANGE, developed by practitioners in linguistics has provided an additional perspective to corpus studies.
One of the more recent effort is the English of Malaysian school students or EMAS corpus by researches from UPM. The EMAS corpus is an untagged and unedited learner corpus that contains written data in the form of three essays written by 800 students. All students who contributed to the corpus were considered as being above average in English language proficiency. The major criterion in selecting the topics for the essays was the amount of language the topics could elicit. Additionally, other corpus initiatives such as the National Science Foundation project also use a similar pictorial or visual prompt.
Numerous language acquisition studies focus on specific target structures and examine the acquisition of these structures over a period f time. The developmental patterns are examined by studying the language productivity as well as vocabulary use of the students. The study examines language development by comparing the performance of the three age groups with regards their language productivity and vocabulary use.
Productivity in this article is indicated by the number of sentences per essay and the words per sentence. A chi-square analysis reveals a significant increase in the number of sentences per essay according to the respondents (x² = 10.03, p <0.05). The diversity of the vocabulary used in corpus is often determined by calculating the type to token ratio. Number of separation words (type) ____________________________ x 100 Number of words in a text (token) A higher type to token ratio suggests that the learners are using many uncommon words whereas lower ratios indicate an over-reliance on a limited set of words. The sophistication of the vocabulary can be determined by using specialized software such as RANGE, a vocabulary analysis program which analyses text by comparing it to several base lists of frequently used words. In conclusion, this article has attempted to present the relevance of corpus data in investigating language development without having to analyze concordance lines The values in this study, therefore, can be regarded as benchmarks against which to compare future groups of students as well as access the development of the language program in Malaysia in general.
