LancsLex: Lancaster Vocab Analysis Tool

English vocabulary interactive resource

LancsLex is a lexical tool that analyses lexical coverage of texts and compares it to the New General Service List that identifies 2,490 most frequent words in the English language. The tool provides insights into lexical diversity and complexity of texts and can be used for research as well as for pedagogically-oriented explorations. With the tool you can:

Identify 2,500 most frequent English words in your text
Identify specialised vocabulary
Check lexical appropriateness of teaching materials
Measure lexical complexity of texts for research purposes

There are two main ways to use the website:

1. Explore the New General Service List. You can explore the general and most frequent words in the English language by browsing the New General Service List (new-GSL). You can also download the whole list ordered according to frequency or alphabetically. Use the ‘Browse’ tab in the top panel to learn more about the new-GSL.

2. Analyse a text. You can explore the lexical composition of texts of your own choice by pasting the text on the online interface. The results will tell you how many different words are contained in the text and which of these words came from the New-GSL. You can thus distinguish between general and specialised vocabulary in texts. The website also provides information about the percentage of the text covered by words from different frequency bands. Use the ‘Analyse’ tab in the top panel to use this functionality.

More about the New General Service List

The new-GSL is a list of ~2,500 common English vocabulary items based on four language corpora of the total size of over 12 billion running words. It can be used for both teaching and research purposes. The list differs from other lists of core English vocabulary in three key respects:

Its innovative use of corpus methods to identify the words in the English language that are truly general, that is, they will be encountered frequently across different genres and topics.
It is based on four corpora while the other lists are based on one corpus.
It is based on lemma as the organising principles. Lemmas include the word and all its inflectional suffixes (e.g. go-goes-going-went) and distinguish between word classes (‘go’ as a verb and ‘go’ as a noun).

To read more about the development of the New General Service List and about general principles of wordlist creation see the following two papers:

Brezina, V. & Gablasova, D. (2015). Is there a core general vocabulary? Introducing the New General Service List. Applied Linguistics 36 (1):1-22.
Brezina, V. & Gablasova, D. (2017). How to Produce Vocabulary Lists? Issues of Definition, Selection and Pedagogical Aims. A Response to Gabriele Stein. Applied Linguistics, 38 (5), 764–767.

Watch a video about the development of the new-GSL

More about the Online text analysis tool

The tool was designed to allow exploration of lexical composition of English texts. The tool differs from other lexical analysis tools in several major areas.

1. Identification of word classes. The tool analyses the texts grammatically which enables the tool to distinguish disambiguate between words with the same form but from a different word class (e.g. ‘walk’ as a noun and ‘walk’ as a verb).

2. Identification of lemmas. The grammatical analysis allows identifying lemmas in the text. In other words, the tool finds all instances of the same word regardless of the different morphological suffices and groups them all together. For example, both singular and plural occurrences of the same word will be recognised as belonging to the same word.

3. Variety of English. It allows to take into consideration whether the text is from American or British variety of English. The British variety is the default, to activate the comparison with the most frequent words in the American English, tick the ‘American supplement’ box.

4. Inclusion of proper nouns and numbers. It allows you to decide whether you want proper nouns and numbers included in the calculation of text coverage. The majority of lexical coverage tools include these words by default; however, it has been argued that inclusion of proper nouns can lead to over-estimation of lexical diversity and sophistication of text and is not preferred in some instances.

The tool produces information about the number of words in the text and the percentage of the text covered by the New-GSL. This allows users to distinguish between general and specialised vocabulary in the text. In addition, it offers a further breakdown of the lexical items found in the new-GSL into more specific frequency based (e.g. 0-500, 501-1000, etc). It provides the information both in a form of a table as well as in graphic format where words from different frequency bands are highlighted in the text. The tool also provides the frequency breakdown and text coverage according to word classes (e.g. 20 per cent of the text consists of nouns).