library(tidyverse)
library(lme4)
The following is some boilerplate R code I use to test whether, in my forced-choice (binomial) data, my participants are selecting one option different from chance (50%).
# Date: July 21, 2016;
# updated January 30, 2018;
# updated May 3, 2018;
# updated August 6, 2021
data = read.csv("data/binomial-data.csv",header=TRUE) # data file
# "selectCode" is the responses which consist of 1s (hits) and 0s (misses)
head(data)
A 1-sample proportions test is excellent for determining whether a single condition or vector of hits and misses (successes and failures) differs from some pre-determined level. This level could be chance (e.g., 50% with two options, 33.3% with three, etc) or it could be another relevant value (e.g., some threshold for accuracy).
First, we create a summary dataset from our raw data.
data.pct <- data %>% # save following operation into new dataset
group_by(experiment, condition) %>% # choose which columns define "100%" of each subset
summarise(total = n()) %>% # summarize to find the total number of each subset
full_join(data, ., by = c("experiment", "condition")) %>% # join it back to your data
group_by(experiment, condition, selectCode, total) %>% # choose columns to summarize
summarise(count = n()) %>% # count how many correct and incorrect responses there were
mutate(percent = count/total) # calculate the percent for each subset
`summarise()` has grouped output by 'experiment'. You can override using the `.groups` argument.
`summarise()` has grouped output by 'experiment', 'condition', 'selectCode'. You can override using the `.groups` argument.
data.pct
Next, check if the Baseline condition in Experiment 1 is different from chance levels of performance.
# prop.test( x = successTrials ,
# n = totalTrials ,
# p = probabilityOfSuccess )
data.pct %>% # taking our summary table
filter(experiment == "first", # keep only Experiment 1
condition == "Baseline", # keep only Baseline condition
selectCode == 1) %>% # keep only correct selections
pull(count) -> baseline.success.exp1 # pull out the value(s) from the count column
data.pct %>% # taking our summary table
filter(experiment == "first", # keep only Experiment 1
condition == "Baseline", # keep only Baseline condition
selectCode == 1) %>% # keep only correct selections
pull(total) -> baseline.total.exp1 # pull out the value(s) from the total column
prop.test(x = baseline.success.exp1,
n = baseline.total.exp1,
p = 0.5) # check against "chance" levels of 50%
1-sample proportions test with continuity correction
data: baseline.success.exp1 out of baseline.total.exp1, null probability 0.5
X-squared = 1.5042, df = 1, p-value = 0.22
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.4763963 0.6055812
sample estimates:
p
0.5416667
No, it is not detectably different from chance (β=0.541, χ²(1)=1.50, p=0.22).
Now, do the same test with the Treatment condition in Experiment 1.
# prop.test( x = successTrials ,
# n = totalTrials ,
# p = probabilityOfSuccess )
data.pct %>% # taking our summary table
filter(experiment == "first", # keep only Experiment 1
condition == "Treatment", # keep only Baseline condition
selectCode == 1) %>% # keep only correct selections
pull(count) -> treatment.success.exp1 # pull out the value(s) from the count column
data.pct %>% # taking our summary table
filter(experiment == "first", # keep only Experiment 1
condition == "Treatment", # keep only Baseline condition
selectCode == 1) %>% # keep only correct selections
pull(total) -> treatment.total.exp1 # pull out the value(s) from the total column
prop.test(x = treatment.success.exp1,
n = treatment.total.exp1,
p = 0.5) # check against "chance" levels of 50%
1-sample proportions test with continuity correction
data: treatment.success.exp1 out of treatment.total.exp1, null probability 0.5
X-squared = 61.004, df = 1, p-value = 5.695e-15
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.6937691 0.8062530
sample estimates:
p
0.7541667
Yes, it is detectably different from chance (β=0.754, χ²(1)=61.0, p<0.0001).
Because we are using binomial data, we must use a logistic regression. The logistic regress is nearly identical to a linear regression, but rather than finding a linear line of best fit, it finds a log-linear line of best fit, which looks like an S-curve when the predictor is continuous.
Let’s look at Experiment 1 only.
model.glm <- glm(selectCode ~ 1 +
condition,
family = "binomial",
data = data %>% filter(experiment == "first"))
summary(model.glm)
Call:
glm(formula = selectCode ~ 1 + condition, family = "binomial",
data = data %>% filter(experiment == "first"))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6752 -1.2491 0.7512 1.1073 1.1073
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.1671 0.1296 1.289 0.197
conditionTreatment 0.9539 0.1981 4.814 1.48e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 622.78 on 479 degrees of freedom
Residual deviance: 598.74 on 478 degrees of freedom
AIC: 602.74
Number of Fisher Scoring iterations: 4
Here we can see that our maximal model provides an estimate (β) for the Treatment condition as 0.9539. This is significant because the standard error (SE, or Std. Error) is smaller than the estimate (0.1981 < 0.9539), which is reflected in the test statistic, in this case a Z-value, of 4.814. This is larger than the threshold needed for significance, so the p-value is less than 0.05. In fact, the p-value written in scientific notation is is 1.48e-6 (“one point four eight times ten to the negative sixth power”), which means you would need to move the decimal point six places to the left (p = 0.00000148).
If you have a more complex model, e.g., a GLMER that also takes into account the other experiments included in this dataset, you may wish to do some model comparison.
model.max <- glmer(selectCode ~ 1 + # outcome measure as a function of the null hypothesis
condition + # and condition
experiment + # and experiment
condition:experiment + # and the interaction of condition and experiment
(1 | subject) + # with random intercepts for subject
(1 | item), # and item
family = "binomial", # using a binomial linking function
data = data) # and our dataset
boundary (singular) fit: see ?isSingular
# * the null hypothesis is by default included in all models and
# can be explicitly written out as 1, which I have done to allow
# all the other terms to be on their own lines
summary(model.max)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: selectCode ~ 1 + condition + experiment + condition:experiment + (1 | subject) + (1 | item)
Data: data
AIC BIC logLik deviance df.resid
1662.1 1704.3 -823.0 1646.1 1432
Scaled residuals:
Min 1Q Median 3Q Max
-3.1688 -0.7435 0.3197 0.8767 1.9048
Random effects:
Groups Name Variance Std.Dev.
item (Intercept) 1.483e-02 1.218e-01
subject (Intercept) 7.777e-10 2.789e-05
Number of obs: 1440, groups: item, 24; subject, 20
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.1677 0.1345 1.247 0.2124
conditionTreatment 0.9568 0.2046 4.676 2.93e-06 ***
experimentsecond -0.4030 0.1839 -2.192 0.0284 *
experimentthird -0.7891 0.1878 -4.202 2.64e-05 ***
conditionTreatment:experimentsecond 1.5285 0.3233 4.728 2.27e-06 ***
conditionTreatment:experimentthird -1.5761 0.2860 -5.511 3.56e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) cndtnT exprmnts exprmntt cndtnTrtmnt:xprmnts
cndtnTrtmnt -0.657
exprmntscnd -0.681 0.447
exprmntthrd -0.667 0.438 0.488
cndtnTrtmnt:xprmnts 0.388 -0.595 -0.569 -0.278
cndtnTrtmnt:xprmntt 0.438 -0.674 -0.320 -0.655 0.425
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular
While we can see that the Treatment condition is still significantly different to the Baseline condition (β=0.96, SE=0.20, Z=4.68, p<0.0001), we can also see that the Experiments are only listed as pairwise comparisons, thus so are the interactions. In order to report them as a main effect and interaction effect rather than pairwise comparisons, we should do model comparison.
# remove the interaction term to compare this nested depleted model to the maximal one
model.int <- glmer(selectCode ~ 1 +
condition +
experiment +
# condition:experiment +
(1 | subject) +
(1 | item),
family = "binomial",
data = data)
boundary (singular) fit: see ?isSingular
# this model is identical to our maximal model, except we have commented out
# the term of interest
anova(model.max, model.int)
Data: data
Models:
model.int: selectCode ~ 1 + condition + experiment + (1 | subject) + (1 | item)
model.max: selectCode ~ 1 + condition + experiment + condition:experiment + (1 | subject) + (1 | item)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
model.int 6 1761.2 1792.8 -874.57 1749.2
model.max 8 1662.1 1704.3 -823.04 1646.1 103.06 2 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here, the model comparison shows us that the overall contribution of the interaction term is significant to the fit of our model (χ²(1)=103.06, p<0.0001). This means we can say there is a significant interaction between condition and experiment.
We should also check if there is a main effect of experiment:
# remove the all terms containing the term of interest to compare this nested depleted model
model.exp <- glmer(selectCode ~ 1 +
condition +
# experiment +
# condition:experiment +
(1 | subject) +
(1 | item),
family = "binomial",
data = data)
boundary (singular) fit: see ?isSingular
# this model is identical to our maximal model, except we have commented out
# the term of interest and all other terms containing it
# since we commented out two terms, we should compare this model to the next greater, not the maximal one
anova(model.int, model.exp)
Data: data
Models:
model.exp: selectCode ~ 1 + condition + (1 | subject) + (1 | item)
model.int: selectCode ~ 1 + condition + experiment + (1 | subject) + (1 | item)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
model.exp 4 1947.8 1968.9 -969.90 1939.8
model.int 6 1761.2 1792.8 -874.57 1749.2 190.66 2 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
From this, we can see there is a main effect of experiment, regardless of pairwise comparisons (χ²(1)=190.66, p<0.0001).
Now, to understand what this means, we must plot our data. (In fact, we should have done this first…)
This is tidyverse style code for producing a visualization of the data analyzed above. First, we will create a new dataset that will allow us to plot calculated estimates and confidence intervals. It is not necessary to save this as a separate dataset – it could be piped directly into the plot. However, this way, you get a chance to see the content and structure of the dataset first.
data.plot <- data.pct %>% # start with our SUMMARY data, not raw data and save a new copy
# create a column that labels the select codes
mutate(observed = case_when(selectCode == 1 ~ "correct", # 1 = correct
selectCode == 0 ~ "incorrect"), # 0 = incorrect
observed = factor(observed, # make it a factor
levels = c("incorrect", "correct"))) %>% # order the levels
rowwise() %>% # the next operate will be conducted for each row
# using rowwise() is essential for applying certain base R functions such as prop.test
# create columns for estimate, upper CI value, and lower CI value:
mutate(estimate = prop.test(x = count, n = total, p = 0.5)$estimate, # extract estimate
confint.lower = prop.test(x = count, n = total, p = 0.5)$conf.int[1], # extract lower CI
confint.upper = prop.test(x = count, n = total, p = 0.5)$conf.int[2]) # extract upper CI
data.plot
Now, let’s plot it:
data.plot %>%
ggplot(aes(x = condition, # x-axis is by condition
y = estimate, # y-axis is by calculated estimate
fill = observed)) + # this will allow us to hide extra error bars later on*
theme_bw() + # make the plot clean and pretty
geom_col() + # plot the bars as stacked columns
geom_hline(aes(yintercept=.5), # plot a line at 0.5 (50%)
colour = "black", # make it black
alpha=.5, # make it half-transparent
linetype = "dashed") + # make it a dashed line
geom_errorbar(aes(ymin = confint.lower, # define lower CI limit
ymax = confint.upper, # define upper CI limit
alpha = observed), # define transparency for error bars*
width = .1) + # keep the crossbars narrow
scale_alpha_manual(values = c(0,1), guide = "none") + # *hide extra error bars for 'incorrect' values
scale_y_continuous(labels = scales::percent_format()) + # y-axis converted from decimal to percent
ylab("percent selected") + # relabel the y-axis
facet_grid(~experiment) + # put each experiment in its own facet
NULL # this line allows you comment out the previous one(s) without breaking the plot
From this plot, you can see that the Baseline conditions overlap with chance levels in Experiments 1 and 2 but not 3, where it is far below chance. Furthermore, you can see that Experiments 1 and 2 have higher-than-chance Treatment conditions, but not in Experiment 3. This is likely what is driving the interaction effect we observed. Now the statistical tests can be discussed with more clarity and conclusions about how outcomes in the two conditions and three experiments compare (and what this means) can be drawn.
This code builds directly off of the previous code to produce the graph shown at the bottom of this section. It uses only the base graphics library. It looks at Experiment 1 only.
# Author: Lauren M Ackerman
# Available: https://lmackerman.com/r-snippets/
# Date: January 30, 2018
# store the prop.test() results for use in the graph
err.cond.1 <- prop.test(baseline.success.exp1, baseline.total.exp1, p = 0.5)
err.cond.2 <- prop.test(treatment.success.exp1, treatment.total.exp1, p = 0.5)
# format the table as percentages rounded to 2 decimal places
numOfConds <- length(levels(as.factor(data$condition[data$experiment=="first"])))
numerator <- (table(data$selection[data$experiment=="first"],
data$condition[data$experiment=="first"]))
denominator <- (length(data$selection[data$experiment=="first"])/numOfConds)
dfTable <- (numerator/denominator)*100
counts <- round(dfTable,2)
barplot(counts,
# legend = rownames(counts),
main=paste0("Choice of ",rownames(counts)[1]," and ",rownames(counts)[2]," in a forced-choice task"),
ylab="Percent (%)", #xlab="condition",
cex.names=1, names=c(colnames(counts)[1], colnames(counts)[2]),
col=c("lightseagreen","salmon"), border=c("black","black")
)
barplot(counts,
main=paste0("Choice of ",rownames(counts)[1]," and ",rownames(counts)[2]," in a forced-choice task"),
ylab="Percent (%)",
cex.names=1, names=c(colnames(counts)[1], colnames(counts)[2]),
col=c("lightseagreen","salmon"), border=c("black","black")
)
# label the stacked sections of the bar plot with the estimated percents
mtext(paste0(rownames(counts)[1]," = ", counts[1],"%"), side=1,line=-1.5,at=0.7,col="black",cex=1)
mtext(paste0(rownames(counts)[1]," = ", counts[3],"%"), side=1,line=-1.5,at=1.9,col="black",cex=1)
mtext(paste0(rownames(counts)[2]," = ", counts[2],"%"), side=3,line=-1.5,at=0.7,col="black",cex=1)
mtext(paste0(rownames(counts)[2]," = ", counts[4],"%"), side=3,line=-1.5,at=1.9,col="black",cex=1)
barplot(counts,
# legend = rownames(counts),
main=paste0("Choice of ",rownames(counts)[1]," and ",rownames(counts)[2]," in a forced-choice task"),
ylab="Percent (%)", #xlab="condition",
cex.names=1, names=c(colnames(counts)[1], colnames(counts)[2]),
col=c("lightseagreen","salmon"), border=c("black","black")
)
# label the stacked sections of the bar plot with the estimated percents
mtext(paste0(rownames(counts)[1]," = ", counts[1],"%"), side=1,line=-1.5,at=0.7,col="black",cex=1)
mtext(paste0(rownames(counts)[1]," = ", counts[3],"%"), side=1,line=-1.5,at=1.9,col="black",cex=1)
mtext(paste0(rownames(counts)[2]," = ", counts[2],"%"), side=3,line=-1.5,at=0.7,col="black",cex=1)
mtext(paste0(rownames(counts)[2]," = ", counts[4],"%"), side=3,line=-1.5,at=1.9,col="black",cex=1)
# add the confidence intervals based on the prop.test() results
arrows(0.7,counts[1],0.7,(unlist(err.cond.1[6])[1])*100, angle=90,col="black",lwd=1)
arrows(0.7,counts[1],0.7,(unlist(err.cond.1[6])[2])*100, angle=90,col="black",lwd=1)
arrows(1.9,counts[3],1.9,(unlist(err.cond.2[6])[1])*100, angle=90,col="black",lwd=1)
arrows(1.9,counts[3],1.9,(unlist(err.cond.2[6])[2])*100, angle=90,col="black",lwd=1)
NOTE: Sometimes, 1-unlist()
should be used because the count of “1” answers is on the top of this graph, alphabetically. If your error bars are misaligned, you may need to add 1-
to these four lines of code.
Text size (cex
) and line width of the error bars (lwd
) are both set to size 1 in the above code, but should be increased for use on posters. The colors selected here are a close match to the default aesthetics of ggplot2, but are easily changed.