Workshop convenor: Vaclav Brezina, ESRC Centre for Corpus Approaches to Social Science, Lancaster University
|
---|
This workshop will discuss different statistical procedures available for analysis of sociolinguistic data in large language corpora. I will demonstrate that the traditional approach of using aggregated data with the log-likelihood statistic is in principle unreliable. Instead, the workshop will offer suggestions for alternative methodologies and statistical procedures, which take into account within group differences and therefore produce more meaningful results. As part of the workshop, a new research tool BNC64 Search & Compare will be introduced. BNC64 Search & Compare can carry out detailed analyses based on a socially-balanced spoken corpus BNC64 (1.5 million words). BNC64 represents the speech of 64 speakers - 32 men and 32 women - extracted from the British National Corpus (BNC). BNC64 Search and Compare is a web-based environment that creates simple visualisations, calculates statistics and produces concordances. The website was created to allow for easy visualisations of complex corpus data and easy testing of a number of different sociolinguistic hypotheses. The workshop will be structured around a series of practical exercises guiding the participants through different types of analysis of corpus data and statistical procedures. The following areas will be covered:
The workshop does not require any prior knowledge of statistics. It will be of interest to anyone who wants to explore sociolinguistic data using language corpora.
Workshop materials
BNC64 Search & Compare
Stats tools
Calculator: manual calculations | |
Mean, trimmed mean & robust mean difference: simple comparisons | |
Robust Cohen's d: effect size | |
Mann-Whitney U test: non-parametric test | |
Log-likelihood: general corpus comparison |