Answers to exercises: Chapter One practical activities
As noted in the introduction to these activities in the book, the purpose of these exercises is to provide set of very general tasks that should help you find your way around your concordancer if you are not entirely familiar with it.
That being the case, Chapter One's activities do not really have “answers”, since the details of finding your way around a concordancer differ quite a lot depending on which precise program you are using – and, sometimes, which version of the program.
For this reason, the hints below are rather general in nature. We have also assumed, in writing these hints, that you are using a simple desktop concordancer such as WordSmith or AntConc, rather than a complex server-based system suhc as CQP (see here for details on these concordancers).
A1-1) First, investigate the basic setup procedures of your software.
- How do you load a corpus into your concordance tool?
- There will usually be some kind of “open file” window or dialogue box, just as in a word processor or any other piece of desktop software, and this is normally accesed from the File menu of the program.
- How do you change to a different corpus?
- Normally, you must open the list of currently-loaded corpus files and find the Clear option to empty the list. Then, you can start the process of loading in a new corpus from scratch.
- Does the entire corpus have to be in a single text file, or can your concordancer handle a corpus consisting of many files?
- Nearly all concordancers allow a corpus to be loaded from multiple files at once – even very early corpus tools had this capability. Often, you can select multiple files by holding Control and clicking with the mouse in the “open file” window. Some programs, such as AntConc, have a special “Open Directory” option for opening all the files in a specified folder, with no need to select each file one by one.
- Does your concordancer need the texts to be in a particular format, or is simple plain-text OK?
- Very few basic concordancers require a specific format; it is very likely that plain-text will be fine. However, the more complex concordancers which perform indexing of the data (such as Xaira and Corpus Workbench) do demand specific formats. Data for indexing in Xaira must be in XML format; data for indexing in CWB must be in a special column-based format, with one word-token per line. The special requirements these programs have for input files is one reason why they are mostly considered more advanced, specialised tools than programs like WordSmith or AntConc.
A1-2) Next, look at how the concordancing function works.
- How do you search for a particular word?
- A basic search interface typically consists of (1) a text box for you to enter the query term and (2) a button to activate the search. This text-box-plus0-button may be part of the main interface screen (as in AntConc) or it may have its own special dialogue (as in WordSmith)
- Can you search for annotations such as part-of-speech tags, lemmata, or semantic tags – assuming, of course, that they are present in your corpus
- Almost certainly yes – however, it may not be easy!
- If the tags are present in your corpus, then you can search for them just as you would search for words – the concordancer does not need to be aware of the difference between words and tags. For instance, one common format is to have words and tags joined by an underscore, thus: thing_N. If you have a corpus in this format, you can search for a particular tag just by putting the appropriate wildcard before the underscroe in your search term, thus: *_N or similar.
- However, this simple approach is not usually sufficient for corpus files with more complex annotation. In this case, you need the concordancer itself to be aware of the annotation, and to give you special options for searching tags rather than words. But most simple concordancers cannot actively parse apart the annotation from the text. This is where more complex concordancers, such as CQP (and BNCweb / CQPweb which are based on CQP), come into their own.
- So there is a choice to be made between more complicated programs which are annotation-aware, versus simpler programs which are not.
- Are searches case-sensitive (treat <A>0and <a> differently) or case-insensitive (treat them the same)? Can you change this behaviour?
- Most concordance programs allow you to switch between case sensitive and non-case sensitive searching. In some languages, such as Chinese or Arabic, this distinction is irrelevant, of course. However, in English it may be a useful way of distinguishing between proper nouns and other categories of word which would otherwise be homographs, e.g. Brown versus brown and Barking versus barking (although this will not work all the time – for instance, consider what will happen if a normally-lowercase word happens to occur at the start of a sentence). In this context, it is interesting to consider the German language, where all nouns, not just proper nouns, are given an initial capital letter. For this reason, distinguishing case is more important still in the analysis of German.
- Can you thin concordances, i.e. reduce the number of results that are displayed?
- This is usually possible, and allows you to focus in on a few examples, should you wish to do so.
- How do you save or export a concordance for later reference?
- This is an important feature for a concordancer to have, as it is very difficult to do careful research if we cannot store the results of corpus searches and come back to them later. Nearly all concordancers allow this, and some let you do it in several different ways, depending on what format you want to store the data in. However, exactly how you do this varies a lot between different concordance programs. In most cases it will be available as a menu option, but it may be called different things: Save query, Export concordance, Save output and so on. In web-based concordancers it may be called Download query or similar.
- You often have a choice between saving the concordance in the program's own special file format – which allows it to be opened later in the same program, but not in any other program – or exporting it into plain text &ndash which is more convenient for moving the data into another program such as a spreadsheet, word processor or database, but which cannot then be re-opened in the concordancer.
A1-3) Finally, work out what the statistical capabilities of your concordancer are.
- How can you get a frequency list (of words or tags) in your concordancer?
- The Word List function is usually a separate interface to the concordance search interface. In WordSmith, for instance, it is a separate window; in AntConc it is a separate “tab” on the main window; in BNCweb/CQPweb it is a separate option on the menu on the mian search page.
- Can you get basic corpus summary statistics – such as total number of words (tokens), type-token ratio, and so on?
- Almost certainly yes, as this is a very basic function. However, different concordancers put these statistics in very different places. In WordSmith, it is part of the WordList window; in CQPweb, there is a seaprate View Corpus Metadata menu option which contains this information.
- Can you produce tables of collocation statistics from a concordance?
- The concordance display will usually include either a button or a menu option that runs the collocation calculation. But the statistics produced in the collocation table vary quite a lot. There will usually be a frequency count for the combination of node and collocate, and there will often be some kind of statistical measure of the strength or significance of the collocation as well. Other than that, different tools include different sets of additional information.
- Can you get a frequency list of n-grams (also known as clusters or multi-word units)?
- Most, but not all, concordancers can do this. The n-grams function will usually be found in the same place as the word list (as it is just another kind of frequency list). But the exact procedure can vary quite a lot across different programs.
- How do you save or export these statistical results?
- This is usually similar to how you save or export a concordance – so look for a similarly-named menu option, for instance.