by Tony McEnery and Andrew Hardie; published by Cambridge University Press, 2012

English Corpus Linguistics at University College London

The Survey of English Usage (SEU), started by Randolph Quirk at UCL in 1959, was the first attempt to provide an ongoing collection of present-day English that would, over time, facilitate the diachronic study of British English. It was a precursor of later corpora such as the British National Corpus, recording both written and spoken English and sampling them in a range of genres and contexts. The data was also grammatically annotated. The SEU was a groundbreaker in corpus linguistics. Initially the corpus was not stored on a computer at all. It was stored on file cards and only later converted into a computerised form, the spoken part of which is available as the London-Lund Corpus (Svartvik 1990). When computerised, the corpus contained one million words of grammatically-analysed modern English.

Later on, the team at UCL, led by Sidney Greenbaum, took the lead in developing a corpus for the comparative study of varieties of English, the International Corpus of English (ICE; see Greenbaum 1996). This corpus includes a very wide variety of Englishes from around the world, including Australian, British, Hong Kong, Indian and Irish English (more are added regularly). The goal is to compile a series of comparable one million word corpora for these varieties of English, representing both written and spoken forms of the language since 1989. ICE is an unequalled resource for the synchronic comparative study of world varieties of English.

Arising from the work on the SEU, the Grammar of Contemporary English (Quirk et al. 1972) and the Comprehensive Grammar of the English Language (Quirk et al. 1985) were published. The 1985 grammar was the first widely distributed modern corpus-informed grammar, making its publication something of a milestone in the development of corpus linguistics.

As well as its pioneering efforts in data collection and grammatical annotation and analysis, UCL made one other contribution whose importance is difficult to overestimate. It provided a steady stream of grammarians trained in the corpus approach to linguistics. These grammarians went on to develop ECL and to establish much more firmly the methodological basis of corpus linguistics. Notable corpus linguists who gained experience working on the SEU include Geoffrey Leech and Jan Svartvik, both of whom went on to develop corpus linguistics further, at Lancaster University (UK) and the University of Lund (Sweden) respectively.


This page was last modified on Monday 16 April 2012 at 5:10 pm.

Tony McEnery Andrew Hardie

Department of Linguistics and English Language, Lancaster University, United Kingdom