#LancsBox: Lancaster University corpus toolbox
Language specific localisation
#LancsBox suports many languages. There are, however, different levels of support.
- Level 1: Basic support without grammatical annotation. Practically any language has this basic type of suport regardless if it is a left-to-right (e.g. English, Chinese) or right-to-left (e.g. Arabic, Hebrew) language.
- Level 2: POS annotation. Currently about 20 languages are supported at this level. These are the languages for which a Tree Tagger parameter file is available.
- Level 3: Full support including POS, POS categories (used for lemmatisation), smart searches, abbreviation recognition and clitics recognition. Currently only a handful of languages are fully supported. We therefore invite collaborators to get in touch to help us build this type of support.
Here is a document in pdf explaning how to provide additional support for a language, which is currently not fully suported. The following Excel spreadsheet offers an example from English about the individual pieces of information and the format in which these need to be presented.