Name of the corpus | Language | Date | Mode | Text types | Size (tokens) | Link to data |
A Corpus of English Dialogues | British English | 1560-1760 | written and spoken | drama, fiction, trial proceedings, witness depositions | 1,183,690 | link |
American National Corpus | American English | 1998-2015 | written and spoken | newspapers, fiction, academic writing, informal speech | 22,000,000 | link |
Anthology Reference Corpus | British English | 1960s-2007 | written | conference and journal papers in natural language processing and computational linguistics | 62,196,334 | link |
Anthology Reference Corpus RD-TEC 2.0 | British English | 1978-2006 | written | 300 abstracts from articles in the ACL | 33,216 | link |
Australian Corpus of English | Australian English | 1960s | written | newspapers, magazines, fiction, academic texts, government documents(it matches BROWN & LOB) | 1 million | link |
Brexit corpus | English | 2016 | written | web, blogs, newspapers, forums, Twitter posts | 108,452,923 | link |
British Academic Spoken English corpus | British English | 1999-2005 | spoken | university seminars and lectures | 1.6 milion | link |
British Academic Written English Corpus | British English | 2004-2007 | written | assessed student writing | 6,506,995 | link |
British English 2006 | British English | 2003–2008 | written | press, fiction, academic | " | 1,147,097" | link |
British Law Report Corpus | British English | 2008–2010 | written | legal texts | 8.5 million | link |
British National Corpus 1994 | British English | 1990s (some texts go back to 1960s) | written and spoken | newspapers, fiction, academic writing, informal speech | 100 million | link |
British National Corpus 1994 Baby edition | British English | 1990s (some texts go back to 1960s) | written and spoken | newspapers, fiction, academic writing, informal speech | 100 million | link |
Brown University Standard Corpus | American English | 1960s | written | press, fiction, academic | 1,007,299 | link |
Cambridge Academic English Corpus | US and UK English | NA | written and spoken | lectures, seminars, essays, textbooks, preentations | 3,163,648 | link |
CHILDES English | US and UK English | NA | spoken | child language | NA | link |
Corpus of Early English Correspondence Sampler | British English | 1418–1680 | written | letters | 0.45 million | link |
Corpus of Late Modern English prose | English | 1837-1926 | written | letters | 100,000 | link |
Corpus of the English Web | English | NA | written | webpages | 3,268,798,627 | link |
DGT-Translation Memory | 23 laguages | 2007-present | written | law documents | 2,104,147,314 | link |
EcoLexicon English Corpus | English | NA | written | environmental texts | 23.1 million | link |
Freiburg-Brown corpus of American English | American English | 1991-1992 | written | press, fiction, academic | approx. 1 million | link |
Freiburg–LOB Corpus of British English | British English | 1991–1996 | written | press, fiction, academic | approx. 1 million | link |
Hansard Corpus | British English | 1803-2005 | spoken | parliament speeches + semantic annotations | 1.6 billion | link |
Helsinki corpus of English texts | English (old, middle, early modern) | 730-1710 | written | literary and religious texts | 1.5 million | link |
London-Lund corpus of spoken English | British English | 1980 | spoken | broadcast news and scripted speech | 500,000 | link |
Melbourne survey corpus of Australian English | Australian English | 1980-1981 | ? | ? | ? | link |
Open American National Corpus | American English | 1998-2015 | written and spoken | newspapers, fiction, academic writing, informal speech | 15 million | link |
Parsed Corpus of Early English Correspondence | British English | 1410-1681 | written | letters | 2,159,132 | link |
Speech, Thought and Writing Presentation Corpus | British English | 1960-1990 | written and spoken | conversations, oral narratives, | | link |
the Edinburgh DOST corpus of Older Scottish texts | Older Scottish | 1450-1600 | written | prose and poetry | NA | link |
The Lancaster Newsbooks Corpus | British English | 1654-1655 | written | pamphlets, books and newspapers | 800,000 | link |
The Lancaster/Oslo-Bergen Corpus | British English | 1960s | written | press, fiction, academic | 1 million | link |
The Toronto dictionary of Old English corpus | old English | 600-1150 | written | old English texts | NA | link |
VU Amsterdam Metaphor Corpus | British English | 1991-1994 | written and spoken | a subset of BNC Baby, annotated for metaphor | | link |