by Tony McEnery and Andrew Hardie; published by Cambridge University Press, 2012

Answers to exercises: Chapter Three discussion questions

Q3-1) Should a corpus be censored?

Generally we believe a corpus should not be censored. To censor the data is to distort the representation of the language. Given that a primary benefit of the corpus approach is to allow us to explore language as it is actually spoken and written, censorship seems a negation of that goal.

However, there are some types of highly sensitive data where restriction does need to be considered – for example the writings of political radicals who are trying to influence people to commit acts of violence, or the writings of extreme political groups espousing racial hatred. But even in cases like this, we would not advocate censorship of the texts themselves. What does need additional, and careful, consideration is whether such corpus data should be indiscriminately distributed. Critical issues to consider here include: (1) is the material in question otherwise (widely) available? (2) would distributing the corpus constitute, or appear to consitute, endorsement of the contents? (3) would distributing the corpus propagate the materials within it more widely than they would otherwise have been propagated? All of this is, it is worth underlining, a matter on which reasonable people may differ. (One of the present authors believes that there exists an ethical obligation on corpus distributors to ensure that this kind of data is distributed only to those with a legitimate interest in the scientific study of such material. The other author does not believe that any such ethical obligation exists, although in some jurisdictions a legal obligation might.)

Q3-2) When we analyse a discourse, we are also contributing to that discourse. What ethical obligations does this place on an investigator?

There are sensitivities to consider here. For example, you may come from a powerful group within a society and you may be writing about a group with less power. Your presentation of your findings may be used against the people concerned or, where your results support or appear to support complaints they have made, you may be cited as an authority in support of their complaints. Your work may be misrepresented by groups who do not like the conclusions you have drawn.

The key to these dilemmas is to avoid advocacy, and to adhere instead to description: that, is to say only what the data itself allows you to say. That is not to say that there is no place for advocacy – rather that empirical, descriptive research is not that place! The corpus approach is arguably of help here; if you abide by the principle of total accountability it will be more difficult for a claim of partiality to be levelled against you. People may claim your corpus is biased, but they should not be able to say that your analysis of it was.

How can we minimise the observer effect that will be produced by steps taken to comply with ethical standards?

An observer effect is inevitable in non-surreptitious recording; any steps taken for reasons of research ethics can only amplify this effect. So, when we ask how to miniimise this effect, it is actually a much broader issue than one related solely to research ethics.

Actions we might take to ameliorate the obsever effect include the following:

Finally, you might consider the older post-hoc-consent approach to spoken corpus collection – gather the data first, without consent, and then ask for consent afterwards, destroying the recording if the consent is not forthcoming. We anticipate, however, that ethics committees will probably have more difficulty with this approach!

Tony McEnery Andrew Hardie

Department of Linguistics and English Language, Lancaster University, United Kingdom