by Tony McEnery and Andrew Hardie; published by Cambridge University Press, 2012
 

Answers to exercises: Chapter Two discussion questions

Q2-1) How would you go about manually adding pragmatic annotation to a spoken corpus?

There are many different bodies of work in the history of pragmatics which have laid out analytic schemata that could, potentially, be operationalised via corpus annotation. Note that any effort in this direction essentially has to be manual, as the understanding of the language needed to do pragmatic analysis is at a higher level than computers are capable (currently and for the foreseeable future).

For example, in the study of speech acts, we classify utterances (or parts of utterances) in terms of what action each accomplishes (or attempts to accomplish). This could be analysed on a large scale by annotating all utterances in a corpus according to what speech act or acts each has – either using the technical terminology of scholars such as Austin or Searle, or by using more straightforward category labels such as statement, command, request, question and so on.

Another classic approach to pragmatics is Grice's Co-operative Principle, which attempts to explain how meaning is communicated by conversational participants through adherence to – or violation of – certain maxims, namely “be truthful; do not provide too little/too much information; be relevant; be clear”. Again, this could be operationalised into a set of labels (one for each of the four maxims, perhaps, together with tags to indicate whether or not the maxim is being obeyed) that could be applied to (parts of) utterances in a corpus.

There are multiple approaches to the study of politeness; for instance, Geoffrey Leech has proposed a Politeness Principle made up of maxims in parallel to Grice's maxims for co-operative communication. This is especially useful for analysing cases where the Gricean maxims are disobeyed for reasons of politeness. Another analysis, somewhat more widely utilised, looks at politeness strategies in terms of their effect on positive and negative face; this is rooted in the work of Brown and Levinson. Again, in principle any approach to politeness could be operationalised as a tagset that can be applied to the corpus.

Finally, there are many aspects of speaker meaning evident in spoken discourse that can be annotated that do not have direct bearing on any particular theory of pragmatics, for example discourse particles, bad language, hedges, back-channel devices, and similar features.

When developing an annotation scheme for pragmatic analysis, deciding on the list of phenomena or categories to annotate is likely to be the easy part. The difficult part will be deciding on clear and explicit rules for applying the tags. This is partly because pragmatic categories are less likely to be mutually exclusive than are, say, grammatical tags. At the part-of-speech level, when we analyse something as a verb, we are also stating that it is not an adjective or a noun. However, at the pragmatic level, this kind of clear mutual exclusivity is much less likely to hold; many utternaces can be multifunctional, and a good pragmatic annotation schme will incorporate some means of representing this.

Another difficulty in pragmatic annotation is deciding how to assing tags to specific regions of the text. It is clear that pragmatic functions extend across many word-tokens, and often across many clauses or sentences. Yet equally, within a single utterance there may be many shifts of function – and there may well be no easily idenitifiable single-token point at which the shift occurs. Coming up with rules for determinng and annotating the start- and end-points of the “zones” to be labelled is liable to be a much more difficult task than coming up with functional categories in the first place.

One final difficulty in pragmatic annotation is deciding where to stop – that is, what aspects of pragmatic meaning are not going to be addressed by the analysis. The natural temptation is to include everything that might be of relevance, but as the survey above indicates, there are so many areas of pragmatics that might be of interest that attempting to annotate them all at once will lead to an overcomplicated, difficult-to-apply tagging schema.

Actually encoding the labels into the corpus can be done in many ways. The most standard way is with XML. For pragmatics, the outstanding feature of XML is that it can be used to indicate both points within a text and zones or regions of any size. There also exist many off-the-shelf software tools for checking for errors in XML annotation. The drawback of XML is that the tags are rather verbose and thus require a major typing effort. This is not an insuperable problem. One way round it is to use a specialised XML editor which allows you to insert predefined tags without having to type them in full each time. Another is to use abbreviated codes for actually typing in the tags, and then create simple computer program (for instance, using word-processor macros) to globally search-and-replace the abbreviations into the full XML tags.

Q2-2) In what kind of situations might the error rate of part-of-speech tagging be a problem?

The errors in part-of-speech tagging might cause problems in a number of ways. One key point to understand is that the errors are unlikely to be distributed evenly across categories or even across different words. Some words or types of words are inherently more ambiguous than others, which leads to higher rates of error. So the errors are not just “random noise” – they are biased towards certain areas of the corpus.

For example, an English POS tagger will probably always identify articles accurately, but may well have an error rate higher than 5% when disambiguating, for instance, words which might be either nouns and verbs in different contexts. So if you are interested in a word or category which has a higher rate of error than 5%, you may discover that the level of accuracy is unhelpfully low.

One classic example of a situation where this causes problems is if you want to use the POS tagging as the basis for higher-levels of analysis, for example of syntax. There is a garbage-in, garbage-out effect here, where a single-word error in the POS analysis can throw off the analysis of a whole clause or sentence. While this is an important issue for parser designers, it does not affect most corpus users, however. For them, the major problem is that POS tagging errors will result in corpus searches either missing things that they ought to find, or finding things that ought not to have been found. Either of these situtations – or a mix of both, as is more usual – can impede the quality of the analysis that results from the corpus query.

There are two key ways to mitigate the problems caused by the error rate of automatic analyses (of part-of-speech, or, indeed, any other type of tagging). These are, firstly, to weed out as many errors as possible; and secondly to moderate the claims you make on the basis of the searches. That is, when writing up your results, you need to:

Even with a 5% error rate, however, you will discover that the benefit of scale that the automated analysis of thousands or millions of words of data brings far outweighs the potential disadvantages associated with the error, especially if the precautionary steps outlined above are taken.

Q2-3) How would possible differences in syntactic structure analysis affect the placement of phase-boundary tags in constituency parsing?

We suggested two examples of syntactic structures with multiple analyses that would lead to differences in placements of the parsing brackets:

You can probably think of many more. Here is one more example:

Obviously any of the shifts above will affect the results of corpus searches or frequency counts for the constituents involved. If we search for Adjective Phrases, for instance, the number (and types) of results we get is absolutely dependent on which analysis of sentences like The vase is broken has been adopted.

To consider the issue more generally: a consistent analysis making a set of decisions in favour of one interpretation of an ambiguity (analysis A) will undoubtedly produce different results from an equally consistent analysis in favour of a competing interpretation (analysis B). The key to dealing with this is to understand clearly what the alternative interpretations were and to explain why you chose analysis A. Then it is perfectly reasonable and legitimate to proceed on the basis of assuming A to be correct.

An alternative – albeit very time consuming – approach would be to try to work with all possible analyses. So, for instance, if one parser applies analysis A and another applies analysis B, you could repeat all your searches and frequency counts on two versions of your corpus – one processed by the first parser and by the second. If your results hold up despite differences of analysis, you can be confident that the precise analysis scheme is not relevant for your purposes. But this is a rare approach to take; you would probably only adopt this approach if you are actually working on the ambiguous syntactic structure in question. By and large, it is more typical to take the pragmatic approach of choosing one analysis over another, and then sticking consistently with the chosen analysis.

 
Tony McEnery Andrew Hardie

Department of Linguistics and English Language, Lancaster University, United Kingdom