Frequently Asked Questions
General BNC2014 FAQs
Who built the BNC2014? Who funded the project?
The BNC2014 is being compiled by a partnership of linguists at the ESRC Centre for Corpus Approaches to Social Science (CASS) at Lancaster University and ELT experts at Cambridge University Press (CUP). Robbie Love is lead researcher for the Spoken BNC2014 and Abi Hawtin is lead researcher for the Written BNC2014. The team also includes Tony McEnery, Andrew Hardie and Vaclav Brezina (Lancaster) and Claire Dembry (CUP).
The construction of the Spoken BNC2014 was jointly funded by CASS and CUP. The construction of the Written BNC2014 is being funded by CASS.
Why is it called the BNC2014?
We used the year 2014 in the name of the corpus for three reasons:
How should I distinguish the BNC corpora when writing about them?
We recommend the following conventions for writing about the BNC corpora:
Spoken BNC2014 FAQs
How do I cite the Spoken BNC2014 in my work?
The primary publication for the Spoken BNC2014, which all research using the corpus should cite, is:
Why do I need to access the corpus through CQPweb?
For the first 12 months of its release, the Spoken BNC2014 is available exclusively through Lancaster University’s CQPweb server. This allows us to monitor uptake of the resource.
Will the full text files of the corpus be released? When?
Yes. The full corpus will be made available for publicly-accessible download as XML files, along with the associated metadata, in September 2018. We will release tagged (POS, lemma, semantic tag) and untagged versions of the XML files.
What about ‘context-governed’ data?
A key decision we made early in the creation of the Spoken BNC2014 was to collect data which occurred only in informal contexts – i.e. data which would be broadly comparable to the ‘demographically-sampled’ component of the Spoken BNC1994. The rationale for gathering recordings from this single type of situational context is simply that there is greater use of, and demand for, conversational data. Researchers who want to look at British English in specific contexts, especially relatively public contexts, tend to collect their own, specialized corpora. Moreover, some such specialized corpora have been released publicly by their creators and are available to researchers with an interest in the defined context in question. These include:
So, researchers with an interest in context-governed English speech already have options open to them. However, a general corpus of informal speech, in private contexts, is harder to collect due to the requirements of size and demographic spread, and the difficulty of the context to access. Therefore, it is much more in demand in the research community.
Why aren’t you making the audio recordings available too?
We understand that there is great research potential associated with the audio files from which the Spoken BNC2014 transcripts were derived. However, the goal of the first phase of the Spoken BNC2014 project was to produce and make available the transcripts, as a corpus, as quickly as possible. The preparation of the audio files for release will require lots of work – the main challenge being to de-identify the 1,000 hours’ worth of audio files so that things such as names and addresses are ‘bleeped out’. We do plan to do this in the future, but it was not possible to include this task on top of the work required to prepare the corpus itself.
Written BNC2014 FAQs
When will the Written BNC2014 be made available to the public?
The Written BNC2014 will be made available as of Autumn 2018. Updates on this component of the BNC2014 project will be published as this date approaches.
This page was last modified on Monday 25 September 2017 at 2:53 am.