Lancaster Stats Tools online


Statistics meets corpus linguistics

Type in a mathematical expression to be calculated.

Paste tab delimited data including header row and id column.

Select parameters.

One linguistic variable Multiple linguistic variables (relationship)

Description Inference

R code #histogram
hist(x, breaks="Sturges", col="gray", xlab="linguistic variable", main="Histogram")

#boxplot with points and mean overlay
boxplot(myData, ylab = "linguistic variable",xlab="(sub)corpora", outline = FALSE, ylim=c(0, max(myData, na.rm=TRUE)*1.05)); i = 1;while(i <= ncol(myData)) { for(v in myData[,i]){points(jitter(i,3/i),v, col = "blue", pch=1, cex = 1)};
points(i, mean(myData[,i],trim = 0, na.rm = TRUE), col = "red", pch="_", cex = 4) i= i+1; }

#scatter plot with regression line
plot(myData); fitline <- lm(myData[,2] ~ myData[,1]); abline(fitline,col="red")

#error bars
error.bars(myData,stats=NULL, ylab = "linguistic variable",xlab="(sub)corpora", main=NULL,eyes=FALSE, ylim = NULL, xlim=NULL,alpha=.05,sd=FALSE, labels = NULL, pos = NULL, arrow.len = 0.05,arrow.col="red", add = FALSE,bars=FALSE,within=FALSE, col="red")

Select what you want to randomize.

Paste data in the text area and choose what you want to randomize.

Lines Words Sentences

Stats calculator

Graph tool

Vocabulary Frequency and dispersion

Paste the text you want to analyse into the text box below.

Choose language.

Change parameters or leave default options.

a) Case sensitive types

b) TTR normalization basis

c) Word delimiters (in addition to white space)

Paste the text you want to analyse into the text box below.

Choose language.

3. Define a word.

case sensitive (types)

4. Choose the basis for normalisation.

Word calculator

Semantics and discourse Collocations,keywords and lockwords

Enter parameters for collocate calculation.

A) Tokens in the corpus
B) Frequency of
C) Frequency of
D) Frequency of the collocation (node + collocate)
E) Window size L R
F) Correction for window size

#LancsBox is a free multi-platform tool for the analysis of language. #LancsBox, among other things, identifies collocations and keywords. You need to download #LancsBox to your computer.

Paste tab delimited data including header row and id column.

Select the type of judgement variable.

Nominal variable (categories) Ordinal variable (ranks) Interval/ratio variable (scale)

R code # R functions: #nominal
myData1<-table(myData); kappa2.table(myData1)
#nominal 3 + raters
gwet.ac1.raw(myData, weights="ordinal")


Lexico-grammar From simple counts to complex models

Copy paste your data in the box below.

Paste tab delimited data including header row and id column.

Select options.

Input format of the data: Cross-tab Data set

Test: Chi-squared Chi-squared (Yates's correction) Log likelihood Fisher exact test

Visualize relationship

R code source("");
data<- table(data)
#statistical tests
chisq.test(data, correct = FALSE);
chisq.test(data, correct = TRUE);
g.test(data, correct = "none");
#effect sizes
CramerV(data, conf.level = 0.95);

Select what you want to do.

Paste data in the text area.

Type in the exact name of the outcome variable.

Type in the exact name(s) of the predictor(s) [use ; as separator].

Decide if you want to include predictor interactions.

Yes, include all Yes, include some No
Type in the exact names of the predictors with interactions [use ; as separator].

Register variation Correlation, clusters and factors

Paste tab delimited data including header row and id column.

Select options.

Parametric Non-parametric

Visualize correlation

R code library(Hmisc); library(corrplot); library(stats); #libraries used
cor.test(mydata1, mydata2, method="pearson") #Pearson's correlation
cor.test(mydata1, mydata2, method="spearman") #Spearman's correlation
rcorr(mydata, type="pearson") #correlation matrix
plot(mydata, col ="blue"); fitline <- lm(mydata1 ~ mydata2); abline(fitline,col="red") #scatter plot
corrplot(m, method ="color", type = "full", diag = TRUE, addCoef.col="black", addCoefasPercent=FALSE, addgrid.col="grey", tl.pos = NULL, tl.cex = 1, = 45, tl.col = "black") #correlation matrix

Paste tab delimited data including header row and id column.

Select parameters.

Transform data to z-scores

3. Select highlight.

R code mydata <- scale(mydata) # optional z-score transformation
d <- dist(mydata, method = "manhattan") # distance matrix
fit <- hclust(d, method="ward.D") #Cluster analysis
plot(fit, xlab="", ylab="Height", main="")#plot dendrogram
rect.hclust(fit, k=5, border="red") #draw cluster groups

Paste tab delimited data including header row and id column.

Select the type of analysis you want to carry out.

Full MD Comparison with Biber's (1988) dimensions

R code cortest.bartlett(mydata); det(cor(mydata))# Bartlett's test and multi-colinearity test
fa.parallel(mydata, fa="fa", main = "Scree Plot", show.legend=FALSE) #screeplot
factanal(mydata, number, rotation="promax") #factor analysis

Multidimensional analysis [data]

Sociolinguistics Individual and social variation

Paste tab delimited data including header row and id column.

Select data options.

Different groups Same group different conditions

Select type of test.

Parametric test Non-parametric test

R code #t-test
t.test(data[ ,1], data[ ,2], paired=FALSE)
#t-test: repeated measures
t.test(data[ ,1], data[ ,2], paired=TRUE)
#Mann-Whitney-wilcoxon rank sum test
wilcox.test(data[ ,1], data[ ,2], paired=FALSE)
#Mann-Whitney-Wilcoxon rank sum test: repeated measures
wilcox.test(data[ ,1], data[ ,2], paired=TRUE)
#One-way ANOVA
aov(measurement ~ group, data = data)
#Kruskal-Wallis test

Paste tab delimited data including header row and id column.

R code library(languageR);
x = corres.fnc(data);
plot(x, ccex = 0.6, rcex = 0.6);

Paste data in the text area.

Type in the exact name of the outcome variable.

Type in the exact name(s) of the fixed effect predictor(s) [use ; as separator].

Type in the exact name(s) of the random effect predictor.

Decide if you want to include predictor interactions.

Yes, include all Yes, include some No
Type in the exact names of the predictors with interactions [use ; as separator].

R code library(lme4);
glmer(outcome~predictor+(1|randeffect), family = binomial, data = mydata);

Change over time Working with diachronic data

Paste tab delimited data including header row and id column.

Select parameters.

Difference between two corpora (two-tailed)
Increase between corpus 1 and corpus 2 (one-tailed)
Decrease between corpus 1 and corpus 2 (one-tailed)

R code library(boot);
bootstraptest(period1, period2,samples,'p2');
boot(data=b, statistic=percid, R=samples);

Paste tab delimited data including header row and id column.

Select parameters.

R code source("");

Paste tab delimited data including header row and id column.

Select parameters.

No transformation Log transformation

R code library(ggplot2)
p<-ggplot(data, aes(x = data[,1], y =data[,2])) + geom_point() + xladata("Time") + yladata("Linguistic variadatale"); p + stat_smooth(method = "gam", formula = y ~ s(x, datas = "cr", fx=FALSE, k =15), size = 1, fill="#707070", level = 0.95 )+ stat_smooth(method = "gam", formula = y ~ s(x, datas = "cr", fx=FALSE, k =15), size = 1, fill="#FFFF00", level = 0.99);

Indicate historical period.

Upload a zip file with collocation files.

Provide info about data.

Regex for identifying collocates:

Column delimiter:

Define a collocate.

sampling points

% sampling points

Decide if you want to run the analysis with frequency cut-off point.

Yes, absolute cut-off Yes, relative cut-off No

Provide additional info.

Regex for identifying node frequency in header (relative cut-off):

R code #Calculate Gwet's AC1; b...input data frame
i = 1; v <- c(); while(i+1 < ncol(b)) {n=(gwet.ac1.raw(b[,i:(i+2)])[3]);v<- c(v, n); i= i+1; }
#Prepare data frame
h<-seq(from, to, by = 1); g<-data.frame(h,v) #Produce graph
p<-ggplot(g, aes(x = g[,1], y =g[,2])) + xlim(from, to)+ scale_x_continuous(breaks = seq(from, to, by = 10)) + geom_point() + xlab("Time") + ylab("AC1"); p + stat_smooth(method = "gam", formula = y ~ s(x, bs = "cr", fx=FALSE, k =10), size = 1, fill="#707070", level = 0.95 )+ stat_smooth(method = "gam", formula = y ~ s(x, bs = "cr", fx=FALSE, k =10), size = 1, fill="#FFFF00", level = 0.99)

Bringing everything together Ten principles of statistical thinking, meta-analysis and effect sizes

Choose input type for the calculation of effect size.

Insert required value or values. Separate multiple values by a semi-colon (;).

R code library(
pes(p,n1,n2) #based on p-value
mes(m1,m2,sd1,sd2,n1,n2) #based on means
tes(t,n1,n2) #based on t-value (t-test)
fes(F,n1,n2) #based on F (ANOVA)
res(r,NULL,n) #based on r (e.g. correlation)
des(d,n1,n2) #based on Cohen's d
lores(lor,var,n1,n2) #based on Log Oddds Ratio
pes(p,n1,n2) #based on p-value
d=(2*r)/sqrt(1-(r*r)) #based on r only
r=d/sqrt((d*d)+4) #based on Cohen's d only
d=(2*sqrt(e))/sqrt(1-e) #based on eta2
d=(lor*sqrt(3))/pi #based on Log Odds Ratio only

Paste a list of studies and their standardised results (d, n1, n2).

R code library(meta)
#Calculate Variance ES
es.d.v <-(((n1+n2)/(n1*n2))+(es.d^2/(2*(n1+n2))))
#Calculate Standard Errors ES<-sqrt(es.d.v)
forest(meta1, studlab=c("Study1","Study2","Study3","Study4","Study5"), xlab="Cohen’s d", col.square="black",xlim=c(-3,3), col.diamond="black", fontsize=14, squaresize=0.5, leftcols=c("studlab"), rightcols=c("effect", "ci"), hetstat=FALSE, comb.fixed=FALSE, text.random="Overall ES", print.tau2=FALSE,print.I2=FALSE,TE.random=FALSE, seTE.random=FALSE)

