by Tony McEnery and Andrew Hardie; published by Cambridge University Press, 2012

Answers to exercises: Chapter Six practical activities

A6-1) Analyse the collocational behaviour of a selected noun using only a sorted concordance.

While we do not know what noun you have looked at, it is very likely that you have discovered a range of collocates for your noun and that there are some semantic preferences evident in the co-texts around your noun. There will doubtlessly also be important colligates associated with your noun, especially function words which can introduce a noun phrase, such as articles, or which are closely associated with common patterns of subordination around nouns, most notably prepositions.

You have also probably concluded that analysing collocational behaviour this way is much more laborious and painstaking work than using statistics – and you are quite right! Critically, hwoever, there are aspects of collocational behaviour that emerge much more readily from the sorted concordance than from tables of statistics – most especially, links between on the one hand particular semantic preferences and/or different senses of the node, and on the other hand the colligational patterns in which the word occurs.

To exemplify this, we analysed the word reception in the written part of the BNC. There are 2,249 instances, and the frequency is 25.58 per million. Since the BNC is such a large corpus we had more examples than are easily analysable by hand, so we thinned it to a random 200. We then looked at different sorts.

With an R1 sort, the following collocations are evident:

From a neo-Firthian perspective, it is interesting to note that some of these collocates are clearly specific to one sense, and not others, of the noun. Centre, desk, and office clearly select the sense “space at some institution devoted to dealing with newly-arrived outsiders”. A wholly separate meaning, of a reception as a kind of social gathering, is selected by party and room (however, hall seems to be compatible with either “arrival space” or “party”). On the other hand, class selects a very specific UK-only meaning of reception, namely “children's first year of school” – equivalent to US kindergarten.

The L1 collocates display greater variation, but even some of these show potential to select a particular sense of reception (for example, an invitation-only reception is clearly a party and not any other kind of reception). Some form semantic preferences, for instance there is a clear category of evaluative adjectives: hostile, positive, jaundiced, rapturous. The semantic preferences can also select a particular sense of the word – in this case, the evaluative adjectives co-occur with reception in the sense of “an attitude taken towards something new or some new development”.

Going back to R1, we can also identify colligates and colligational patterns. In some cases, these can be seen extending much further rightwards. For instance:

While most of these link particular meanings of reception to particular subcategorisation patterns, and reception of seems to be multifunctional, the final pattern is simply an observation that reception seems often to occur in clause-final position. Particularly in Michael Hoey's approach, this too is an important colligation.

This hardly completes the analysis, but does begin to give an indication of how the pervasive coincidence of paticular contextual meanings with particular grammatical patterns has given rise to neo-Firthian theoretical constructs such as the Idiom Principle, Pattern Grammar and Lexical Priming.

A6-2) Does statistical analysis of your selected noun's collocates lead to the same results?

This depends on how big your corpus is relative to the frequency of your noun. With a relatively small corpus, the statistics may not be powerful enough to detect everything you saw in the sorted concordance as a significant collocate. On the other hand, in a large corpus, you will almost certainly find new collocates by this method, in addition to those you spotted via manual analysis, because the statistics will be powerful enough to spot links that are too rare to notice manually.

Some of the semantic preferences and colligations you spotted before should be evident in the statistics. However, in some cases, it might well be that semantic preferences aren't detected by the statistics, even because the variety of words that instantiate the semantic category are too rare individually to register in a statistical collocation analysis. The same may apply to colligations. Co-occurrence with a particular specific function word will probably be spotted; co-occurrence with a grammatical category that is be realised by many wordforms (for instance, the past tense) may not be.

Again, let's look at reception as a single concrete case study. Here are its mutual information collocates for the written BNC (L4 to R4, minimum corpus frequency of collocate = 5, minimum frequency of co-occurrence = 5, only collocates with MI >= 5 shown here):

CollocateMI scoreCollocateMI scoreCollocateMI score

Note that some of the very strong collocates overlap with what was spotted in the sample concordance, but not all of them do! For instance, champagne and wedding clearly select the “party” meaning, but were not on our list of such collocates from the concordance-based analysis. Also, another meaning to do with television/radio signals emerges in the strong collocate transmission. Clearly, there is overlap between the two analysis – equally clearly the overlap is well short of 100%.

Finally note that none of the colligations, or interactions between colligation and semantic preference, that were observed above are evident here. This is because MI disfavours high-frequency words such as function words. However, if log-likelihood is used instead, then prepositions and articles rise much higher in the list. Even then, however, the links between colligational patterns and specific meanings are not immediately evident. An analyst would need to delve into the concordances for each of the statistical collocates to identify them.

A6-3) Investigate the disambiguation-in-context of a highly polysemous words.

We asked you to begin by thinking of as many senses as possible of the word you are going to look at, before looking at the list of senses in a dictionary. You probably did miss some senses of the word – if you did not, congratulations! Using intuition alone to recall all the senses of polysemous words is far from a trivial task.

It is not always easy to decide on one single sense for every instance of a polysemous word, especially when several of the senses are interrelated with one another; but with some effort, and possibly some arbitrary “hard edges” being assigned to what are really fuzzy boundaries between senses, it can be done.

As for contextual disambiguation, this does depend on what word you are looking at to an extent, as well as the luck of the draw in terms of what examples are in your sample. But by and large, your analysis will probably confirm the Sinclairian view that ambiguous words are in practice always or nearly always disambiguated in context, either by collocates or by their semantic preferences/colligational patterns. Sometimes, you might have to look a long way ahead or behind in the corpus to disambiguate, but this will probably be rare; in most cases, the immediate co-text will be enough to disambiguate the polysemy.

Finally: one additional point that this exercise illustrates is just how helpful corpora and collocation are to lexicographers. Trying to approach polysemy and its disambiguation using only intuition and a random collection of examples would leave a dictionary prone to partiality. The assistance provided by corpus data affords for a much more comprehensive description of word meaning. While it is unlikely that you discovered any word senses not included in a reasonably comprehensive modern dictionary, if you did it is probably because you have found examples of a new sense coming into use, or because you looked at a genre which is not well represented in the dictionary maker's corpus.

Tony McEnery Andrew Hardie

Department of Linguistics and English Language, Lancaster University, United Kingdom