Frequently Asked Questions
General BNC2014 FAQs
Who built the BNC2014? Who funded the project?
The BNC2014 is being compiled by a partnership of linguists at the ESRC Centre for Corpus Approaches to Social Science (CASS) at Lancaster University and ELT experts at Cambridge University Press (CUP). Robbie Love is lead researcher for the Spoken BNC2014 and Abi Hawtin is lead researcher for the Written BNC2014. The team also includes Tony McEnery, Vaclav Brezina, Andrew Hardie and Claire Dembry (CUP).
The construction of the Spoken BNC2014 was jointly funded by CASS and CUP. The construction of the Written BNC2014 is being funded by CASS.
Why is it called the BNC2014?
We used the year 2014 in the name of the corpus for three reasons:
How should I distinguish the BNC corpora when writing about them?
We recommend the following conventions for writing about the BNC corpora:
Spoken BNC2014 FAQs
How do I cite the Spoken BNC2014 in my work?
The primary publication for the Spoken BNC2014, which all research using the corpus should cite, is:
Why was the only way to access the corpus through CQPweb?
For the first 12 months of its release, the Spoken BNC2014 was available exclusively through Lancaster University’s CQPweb server. This allowed us to monitor uptake of the resource.
Have the full text files of the corpus been released?
Yes. The full corpus has been made available for publicly-accessible download as XML files, along with the associated metadata, as of Autumn 2018. The release includes tagged (POS, lemma, semantic tag) and untagged versions of the XML files. They are available for download via the same licence-signup interface as CQPweb access.
What about ‘context-governed’ data?
A key decision we made early in the creation of the Spoken BNC2014 was to collect data which occurred only in informal contexts – i.e. data which would be broadly comparable to the ‘demographically-sampled’ component of the Spoken BNC1994. The rationale for gathering recordings from this single type of situational context is simply that there is greater use of, and demand for, conversational data. Researchers who want to look at British English in specific contexts, especially relatively public contexts, tend to collect their own, specialized corpora. Moreover, some such specialized corpora have been released publicly by their creators and are available to researchers with an interest in the defined context in question. These include:
So, researchers with an interest in context-governed English speech already have options open to them. However, a general corpus of informal speech, in private contexts, is harder to collect due to the requirements of size and demographic spread, and the difficulty of the context to access. Therefore, it is much more in demand in the research community.
Why aren’t you making the audio recordings available too?
We understand that there is great research potential associated with the audio files from which the Spoken BNC2014 transcripts were derived. However, the goal of the first phase of the Spoken BNC2014 project was to produce and make available the transcripts, as a corpus, as quickly as possible. The preparation of the audio files for release will require lots of additional work – the main challenge being to de-identify the 1,000 hours’ worth of audio files, i.e. to ensure that things such as names and addresses are ‘bleeped out’. We do plan to do this in the future, but it was not possible to include this task on top of the work required to prepare the corpus itself.
Written BNC2014 FAQs
When will the Written BNC2014 be made available to the public?
Updates on our progress compiling this component of the BNC2014, and ultimately an announcement regarding its release date, will be published on the Written BNC2014 page on the CASS website.
This page was last modified on Monday 12 November 2018 at 9:22 am.