Name of the corpusLanguageDate ModeText types Size (tokens) Link to data
A Corpus of English DialoguesBritish English1560-1760written and spokendrama, fiction, trial proceedings, witness depositions 1,183,690 link
American National CorpusAmerican English1998-2015written and spokennewspapers, fiction, academic writing, informal speech22,000,000link
Anthology Reference CorpusBritish English1960s-2007writtenconference and journal papers in natural language processing and computational linguistics 62,196,334 link
Anthology Reference Corpus RD-TEC 2.0British English1978-2006written300 abstracts from articles in the ACL 33,216 link
Australian Corpus of EnglishAustralian English1960swrittennewspapers, magazines, fiction, academic texts, government documents(it matches BROWN & LOB)1 millionlink
Brexit corpusEnglish2016writtenweb, blogs, newspapers, forums, Twitter posts 108,452,923 link
British Academic Spoken English corpusBritish English1999-2005spokenuniversity seminars and lectures 1.6 milion link
British Academic Written English CorpusBritish English2004-2007writtenassessed student writing 6,506,995 link
British English 2006British English2003–2008 writtenpress, fiction, academic" 1,147,097"link
British Law Report CorpusBritish English2008–2010writtenlegal texts8.5 million link
British National Corpus 1994British English1990s (some texts go back to 1960s)written and spokennewspapers, fiction, academic writing, informal speech100 million link
British National Corpus 1994 Baby editionBritish English1990s (some texts go back to 1960s)written and spokennewspapers, fiction, academic writing, informal speech100 million link
Brown University Standard CorpusAmerican English1960swrittenpress, fiction, academic 1,007,299 link
Cambridge Academic English CorpusUS and UK EnglishNAwritten and spokenlectures, seminars, essays, textbooks, preentations 3,163,648 link
CHILDES EnglishUS and UK EnglishNAspokenchild languageNAlink
Corpus of Early English Correspondence SamplerBritish English1418–1680writtenletters0.45 millionlink
Corpus of Late Modern English proseEnglish1837-1926writtenletters100,000link
Corpus of the English WebEnglishNAwrittenwebpages3,268,798,627link
DGT-Translation Memory23 laguages2007-presentwrittenlaw documents 2,104,147,314 link
EcoLexicon English Corpus EnglishNAwrittenenvironmental texts23.1 millionlink
Freiburg-Brown corpus of American EnglishAmerican English1991-1992writtenpress, fiction, academic approx. 1 million link
Freiburg–LOB Corpus of British EnglishBritish English1991–1996writtenpress, fiction, academic approx. 1 million link
Hansard CorpusBritish English1803-2005spokenparliament speeches + semantic annotations1.6 billion link
Helsinki corpus of English textsEnglish (old, middle, early modern)730-1710writtenliterary and religious texts1.5 million link
London-Lund corpus of spoken EnglishBritish English1980spokenbroadcast news and scripted speech500,000link
Melbourne survey corpus of Australian EnglishAustralian English1980-1981???link
Open American National CorpusAmerican English1998-2015written and spokennewspapers, fiction, academic writing, informal speech15 millionlink
Parsed Corpus of Early English CorrespondenceBritish English1410-1681writtenletters 2,159,132 link
Speech, Thought and Writing Presentation CorpusBritish English1960-1990written and spokenconversations, oral narratives, link
the Edinburgh DOST corpus of Older Scottish textsOlder Scottish1450-1600writtenprose and poetryNAlink
The Lancaster Newsbooks CorpusBritish English1654-1655writtenpamphlets, books and newspapers800,000link
The Lancaster/Oslo-Bergen CorpusBritish English1960swrittenpress, fiction, academic1 millionlink
The Toronto dictionary of Old English corpusold English600-1150writtenold English textsNAlink
VU Amsterdam Metaphor CorpusBritish English1991-1994written and spokena subset of BNC Baby, annotated for metaphorlink