🔙 Home

library(tidyverse)
library(lme4)

Using prop.test()

The following is some boilerplate R code I use to test whether, in my forced-choice (binomial) data, my participants are selecting one option different from chance (50%).

# Date: July 21, 2016; 
# updated January 30, 2018; 
# updated May 3, 2018;
# updated August 6, 2021

data = read.csv("data/binomial-data.csv",header=TRUE) # data file
# "selectCode" is the responses which consist of 1s (hits) and 0s (misses)
head(data)

1-sample proportions tests

A 1-sample proportions test is excellent for determining whether a single condition or vector of hits and misses (successes and failures) differs from some pre-determined level. This level could be chance (e.g., 50% with two options, 33.3% with three, etc) or it could be another relevant value (e.g., some threshold for accuracy).

First, we create a summary dataset from our raw data.

data.pct <- data %>% # save following operation into new dataset
   group_by(experiment, condition) %>% # choose which columns define "100%" of each subset
   summarise(total = n()) %>% # summarize to find the total number of each subset
   full_join(data, ., by = c("experiment", "condition")) %>% # join it back to your data
   group_by(experiment, condition, selectCode, total) %>% # choose columns to summarize
   summarise(count = n()) %>% # count how many correct and incorrect responses there were
   mutate(percent = count/total) # calculate the percent for each subset
`summarise()` has grouped output by 'experiment'. You can override using the `.groups` argument.
`summarise()` has grouped output by 'experiment', 'condition', 'selectCode'. You can override using the `.groups` argument.
data.pct

Next, check if the Baseline condition in Experiment 1 is different from chance levels of performance.

# prop.test( x = successTrials ,
#            n = totalTrials ,
#            p = probabilityOfSuccess )

data.pct %>% # taking our summary table
   filter(experiment == "first", # keep only Experiment 1
          condition == "Baseline", # keep only Baseline condition
          selectCode == 1) %>% # keep only correct selections
   pull(count) -> baseline.success.exp1 # pull out the value(s) from the count column

data.pct %>% # taking our summary table
   filter(experiment == "first", # keep only Experiment 1
          condition == "Baseline", # keep only Baseline condition
          selectCode == 1) %>% # keep only correct selections
   pull(total) -> baseline.total.exp1 # pull out the value(s) from the total column


prop.test(x = baseline.success.exp1,
          n = baseline.total.exp1,
          p = 0.5) # check against "chance" levels of 50%

    1-sample proportions test with continuity correction

data:  baseline.success.exp1 out of baseline.total.exp1, null probability 0.5
X-squared = 1.5042, df = 1, p-value = 0.22
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.4763963 0.6055812
sample estimates:
        p 
0.5416667 

No, it is not detectably different from chance (β=0.541, χ²(1)=1.50, p=0.22).

Now, do the same test with the Treatment condition in Experiment 1.

# prop.test( x = successTrials ,
#            n = totalTrials ,
#            p = probabilityOfSuccess )

data.pct %>% # taking our summary table
   filter(experiment == "first", # keep only Experiment 1
          condition == "Treatment", # keep only Baseline condition
          selectCode == 1) %>% # keep only correct selections
   pull(count) -> treatment.success.exp1 # pull out the value(s) from the count column

data.pct %>% # taking our summary table
   filter(experiment == "first", # keep only Experiment 1
          condition == "Treatment", # keep only Baseline condition
          selectCode == 1) %>% # keep only correct selections
   pull(total) -> treatment.total.exp1 # pull out the value(s) from the total column


prop.test(x = treatment.success.exp1,
          n = treatment.total.exp1,
          p = 0.5) # check against "chance" levels of 50%

    1-sample proportions test with continuity correction

data:  treatment.success.exp1 out of treatment.total.exp1, null probability 0.5
X-squared = 61.004, df = 1, p-value = 5.695e-15
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.6937691 0.8062530
sample estimates:
        p 
0.7541667 

Yes, it is detectably different from chance (β=0.754, χ²(1)=61.0, p<0.0001).

GLM to compare across conditions

Because we are using binomial data, we must use a logistic regression. The logistic regress is nearly identical to a linear regression, but rather than finding a linear line of best fit, it finds a log-linear line of best fit, which looks like an S-curve when the predictor is continuous.

Let’s look at Experiment 1 only.

model.glm <- glm(selectCode ~ 1 +
                              condition,
                 family = "binomial",
                 data = data %>% filter(experiment == "first"))
summary(model.glm)

Call:
glm(formula = selectCode ~ 1 + condition, family = "binomial", 
    data = data %>% filter(experiment == "first"))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6752  -1.2491   0.7512   1.1073   1.1073  

Coefficients:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)          0.1671     0.1296   1.289    0.197    
conditionTreatment   0.9539     0.1981   4.814 1.48e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 622.78  on 479  degrees of freedom
Residual deviance: 598.74  on 478  degrees of freedom
AIC: 602.74

Number of Fisher Scoring iterations: 4

Here we can see that our maximal model provides an estimate (β) for the Treatment condition as 0.9539. This is significant because the standard error (SE, or Std. Error) is smaller than the estimate (0.1981 < 0.9539), which is reflected in the test statistic, in this case a Z-value, of 4.814. This is larger than the threshold needed for significance, so the p-value is less than 0.05. In fact, the p-value written in scientific notation is is 1.48e-6 (“one point four eight times ten to the negative sixth power”), which means you would need to move the decimal point six places to the left (p = 0.00000148).

If you have a more complex model, e.g., a GLMER that also takes into account the other experiments included in this dataset, you may wish to do some model comparison.

model.max <- glmer(selectCode ~ 1 + # outcome measure as a function of the null hypothesis
                      condition + # and condition
                      experiment + # and experiment
                      condition:experiment + # and the interaction of condition and experiment
                      (1 | subject) + # with random intercepts for subject
                      (1 | item), # and item
                   family = "binomial", # using a binomial linking function
                   data = data) # and our dataset
boundary (singular) fit: see ?isSingular
# * the null hypothesis is by default included in all models and 
#   can be explicitly written out as 1, which I have done to allow
#   all the other terms to be on their own lines
summary(model.max)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: selectCode ~ 1 + condition + experiment + condition:experiment +      (1 | subject) + (1 | item)
   Data: data

     AIC      BIC   logLik deviance df.resid 
  1662.1   1704.3   -823.0   1646.1     1432 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.1688 -0.7435  0.3197  0.8767  1.9048 

Random effects:
 Groups  Name        Variance  Std.Dev. 
 item    (Intercept) 1.483e-02 1.218e-01
 subject (Intercept) 7.777e-10 2.789e-05
Number of obs: 1440, groups:  item, 24; subject, 20

Fixed effects:
                                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)                           0.1677     0.1345   1.247   0.2124    
conditionTreatment                    0.9568     0.2046   4.676 2.93e-06 ***
experimentsecond                     -0.4030     0.1839  -2.192   0.0284 *  
experimentthird                      -0.7891     0.1878  -4.202 2.64e-05 ***
conditionTreatment:experimentsecond   1.5285     0.3233   4.728 2.27e-06 ***
conditionTreatment:experimentthird   -1.5761     0.2860  -5.511 3.56e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
                    (Intr) cndtnT exprmnts exprmntt cndtnTrtmnt:xprmnts
cndtnTrtmnt         -0.657                                             
exprmntscnd         -0.681  0.447                                      
exprmntthrd         -0.667  0.438  0.488                               
cndtnTrtmnt:xprmnts  0.388 -0.595 -0.569   -0.278                      
cndtnTrtmnt:xprmntt  0.438 -0.674 -0.320   -0.655    0.425             
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular

While we can see that the Treatment condition is still significantly different to the Baseline condition (β=0.96, SE=0.20, Z=4.68, p<0.0001), we can also see that the Experiments are only listed as pairwise comparisons, thus so are the interactions. In order to report them as a main effect and interaction effect rather than pairwise comparisons, we should do model comparison.

# remove the interaction term to compare this nested depleted model to the maximal one
model.int <- glmer(selectCode ~ 1 + 
                      condition + 
                      experiment + 
                      # condition:experiment + 
                      (1 | subject) + 
                      (1 | item), 
                   family = "binomial", 
                   data = data) 
boundary (singular) fit: see ?isSingular
# this model is identical to our maximal model, except we have commented out
# the term of interest

anova(model.max, model.int)
Data: data
Models:
model.int: selectCode ~ 1 + condition + experiment + (1 | subject) + (1 | item)
model.max: selectCode ~ 1 + condition + experiment + condition:experiment + (1 | subject) + (1 | item)
          npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
model.int    6 1761.2 1792.8 -874.57   1749.2                         
model.max    8 1662.1 1704.3 -823.04   1646.1 103.06  2  < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here, the model comparison shows us that the overall contribution of the interaction term is significant to the fit of our model (χ²(1)=103.06, p<0.0001). This means we can say there is a significant interaction between condition and experiment.

We should also check if there is a main effect of experiment:

# remove the all terms containing the term of interest to compare this nested depleted model
model.exp <- glmer(selectCode ~ 1 + 
                      condition + 
                      # experiment + 
                      # condition:experiment + 
                      (1 | subject) + 
                      (1 | item), 
                   family = "binomial", 
                   data = data) 
boundary (singular) fit: see ?isSingular
# this model is identical to our maximal model, except we have commented out
# the term of interest and all other terms containing it

# since we commented out two terms, we should compare this model to the next greater, not the maximal one
anova(model.int, model.exp)
Data: data
Models:
model.exp: selectCode ~ 1 + condition + (1 | subject) + (1 | item)
model.int: selectCode ~ 1 + condition + experiment + (1 | subject) + (1 | item)
          npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
model.exp    4 1947.8 1968.9 -969.90   1939.8                         
model.int    6 1761.2 1792.8 -874.57   1749.2 190.66  2  < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

From this, we can see there is a main effect of experiment, regardless of pairwise comparisons (χ²(1)=190.66, p<0.0001).

Now, to understand what this means, we must plot our data. (In fact, we should have done this first…)

Plotting results with error bars

Tidyverse

This is tidyverse style code for producing a visualization of the data analyzed above. First, we will create a new dataset that will allow us to plot calculated estimates and confidence intervals. It is not necessary to save this as a separate dataset – it could be piped directly into the plot. However, this way, you get a chance to see the content and structure of the dataset first.

data.plot <- data.pct %>% # start with our SUMMARY data, not raw data and save a new copy
   # create a column that labels the select codes
   mutate(observed = case_when(selectCode == 1 ~ "correct", # 1 = correct
                               selectCode == 0 ~ "incorrect"), # 0 = incorrect
          observed = factor(observed, # make it a factor
                            levels = c("incorrect", "correct"))) %>% # order the levels
   rowwise() %>% # the next operate will be conducted for each row
   # using rowwise() is essential for applying certain base R functions such as prop.test
   # create columns for estimate, upper CI value, and lower CI value:
   mutate(estimate = prop.test(x = count, n = total, p = 0.5)$estimate, # extract estimate
          confint.lower = prop.test(x = count, n = total, p = 0.5)$conf.int[1], # extract lower CI
          confint.upper = prop.test(x = count, n = total, p = 0.5)$conf.int[2]) # extract upper CI

data.plot

Now, let’s plot it:

data.plot %>% 
   ggplot(aes(x = condition, # x-axis is by condition
              y = estimate, # y-axis is by calculated estimate
              fill = observed)) + # this will allow us to hide extra error bars later on*
   theme_bw() + # make the plot clean and pretty
   geom_col() + # plot the bars as stacked columns
   geom_hline(aes(yintercept=.5), # plot a line at 0.5 (50%)
              colour = "black", # make it black
              alpha=.5, # make it half-transparent
              linetype = "dashed") + # make it a dashed line
   geom_errorbar(aes(ymin = confint.lower, # define lower CI limit
                     ymax = confint.upper, # define upper CI limit
                     alpha = observed), # define transparency for error bars*
                 width = .1) + # keep the crossbars narrow
   scale_alpha_manual(values = c(0,1), guide = "none") + # *hide extra error bars for 'incorrect' values
   scale_y_continuous(labels = scales::percent_format()) + # y-axis converted from decimal to percent
   ylab("percent selected") + # relabel the y-axis
   facet_grid(~experiment) + # put each experiment in its own facet
   NULL # this line allows you comment out the previous one(s) without breaking the plot

From this plot, you can see that the Baseline conditions overlap with chance levels in Experiments 1 and 2 but not 3, where it is far below chance. Furthermore, you can see that Experiments 1 and 2 have higher-than-chance Treatment conditions, but not in Experiment 3. This is likely what is driving the interaction effect we observed. Now the statistical tests can be discussed with more clarity and conclusions about how outcomes in the two conditions and three experiments compare (and what this means) can be drawn.

Base R graphics

This code builds directly off of the previous code to produce the graph shown at the bottom of this section. It uses only the base graphics library. It looks at Experiment 1 only.

Step 1

# Author: Lauren M Ackerman
# Available: https://lmackerman.com/r-snippets/
# Date: January 30, 2018

# store the prop.test() results for use in the graph 
err.cond.1 <- prop.test(baseline.success.exp1, baseline.total.exp1, p = 0.5)
err.cond.2 <- prop.test(treatment.success.exp1, treatment.total.exp1, p = 0.5)

# format the table as percentages rounded to 2 decimal places
numOfConds <- length(levels(as.factor(data$condition[data$experiment=="first"])))
numerator <- (table(data$selection[data$experiment=="first"], 
                    data$condition[data$experiment=="first"]))
denominator <- (length(data$selection[data$experiment=="first"])/numOfConds)
dfTable <- (numerator/denominator)*100

counts <- round(dfTable,2)
barplot(counts,
#  legend = rownames(counts),
   main=paste0("Choice of ",rownames(counts)[1]," and ",rownames(counts)[2]," in a forced-choice task"),
   ylab="Percent (%)", #xlab="condition", 
   cex.names=1, names=c(colnames(counts)[1], colnames(counts)[2]),
   col=c("lightseagreen","salmon"), border=c("black","black")
   )

Step 2

barplot(counts,
   main=paste0("Choice of ",rownames(counts)[1]," and ",rownames(counts)[2]," in a forced-choice task"),
   ylab="Percent (%)",  
   cex.names=1, names=c(colnames(counts)[1], colnames(counts)[2]),
   col=c("lightseagreen","salmon"), border=c("black","black")
   )
# label the stacked sections of the bar plot with the estimated percents
mtext(paste0(rownames(counts)[1]," = ", counts[1],"%"), side=1,line=-1.5,at=0.7,col="black",cex=1)
mtext(paste0(rownames(counts)[1]," = ", counts[3],"%"), side=1,line=-1.5,at=1.9,col="black",cex=1)
mtext(paste0(rownames(counts)[2]," = ", counts[2],"%"), side=3,line=-1.5,at=0.7,col="black",cex=1)
mtext(paste0(rownames(counts)[2]," = ", counts[4],"%"), side=3,line=-1.5,at=1.9,col="black",cex=1)

Step 3

barplot(counts,
#  legend = rownames(counts),
   main=paste0("Choice of ",rownames(counts)[1]," and ",rownames(counts)[2]," in a forced-choice task"),
   ylab="Percent (%)", #xlab="condition", 
   cex.names=1, names=c(colnames(counts)[1], colnames(counts)[2]),
   col=c("lightseagreen","salmon"), border=c("black","black")
   )
# label the stacked sections of the bar plot with the estimated percents
mtext(paste0(rownames(counts)[1]," = ", counts[1],"%"), side=1,line=-1.5,at=0.7,col="black",cex=1)
mtext(paste0(rownames(counts)[1]," = ", counts[3],"%"), side=1,line=-1.5,at=1.9,col="black",cex=1)
mtext(paste0(rownames(counts)[2]," = ", counts[2],"%"), side=3,line=-1.5,at=0.7,col="black",cex=1)
mtext(paste0(rownames(counts)[2]," = ", counts[4],"%"), side=3,line=-1.5,at=1.9,col="black",cex=1)

# add the confidence intervals based on the prop.test() results
arrows(0.7,counts[1],0.7,(unlist(err.cond.1[6])[1])*100, angle=90,col="black",lwd=1)
arrows(0.7,counts[1],0.7,(unlist(err.cond.1[6])[2])*100, angle=90,col="black",lwd=1)
arrows(1.9,counts[3],1.9,(unlist(err.cond.2[6])[1])*100, angle=90,col="black",lwd=1)
arrows(1.9,counts[3],1.9,(unlist(err.cond.2[6])[2])*100, angle=90,col="black",lwd=1)

NOTE: Sometimes, 1-unlist() should be used because the count of “1” answers is on the top of this graph, alphabetically. If your error bars are misaligned, you may need to add 1- to these four lines of code.

Text size (cex) and line width of the error bars (lwd) are both set to size 1 in the above code, but should be increased for use on posters. The colors selected here are a close match to the default aesthetics of ggplot2, but are easily changed.

---
title: "Forced choice data"
author: "Lauren M Ackerman"
date: "Last updated: 06 August 2021"
output: 
  html_notebook:
    toc: true
    toc_float: true
    includes: 
      in_header: google_analytics.html
---
[🔙 Home](https://verbingnouns.github.io/notebooks/)

```{r setup, message=FALSE}
library(tidyverse)
library(lme4)
```


# Using prop.test()

The following is some boilerplate R code I use to test whether, in my forced-choice (binomial) data, my participants are selecting one option different from chance (50%).

```{r}
# Date: July 21, 2016; 
# updated January 30, 2018; 
# updated May 3, 2018;
# updated August 6, 2021

data = read.csv("data/binomial-data.csv",header=TRUE) # data file
# "selectCode" is the responses which consist of 1s (hits) and 0s (misses)
head(data)
```

## 1-sample proportions tests

A 1-sample proportions test is excellent for determining whether a single condition or vector of hits and misses (successes and failures) differs from some pre-determined level. This level could be chance (e.g., 50% with two options, 33.3% with three, etc) or it could be another relevant value (e.g., some threshold for accuracy).

First, we create a summary dataset from our raw data.

```{r, message=FALSE, warning=FALSE}
data.pct <- data %>% # save following operation into new dataset
   group_by(experiment, condition) %>% # choose which columns define "100%" of each subset
   summarise(total = n()) %>% # summarize to find the total number of each subset
   full_join(data, ., by = c("experiment", "condition")) %>% # join it back to your data
   group_by(experiment, condition, selectCode, total) %>% # choose columns to summarize
   summarise(count = n()) %>% # count how many correct and incorrect responses there were
   mutate(percent = count/total) # calculate the percent for each subset

data.pct
```

Next, check if the Baseline condition in Experiment 1 is different from chance levels of performance.

```{r}
# prop.test( x = successTrials ,
#            n = totalTrials ,
#            p = probabilityOfSuccess )

data.pct %>% # taking our summary table
   filter(experiment == "first", # keep only Experiment 1
          condition == "Baseline", # keep only Baseline condition
          selectCode == 1) %>% # keep only correct selections
   pull(count) -> baseline.success.exp1 # pull out the value(s) from the count column

data.pct %>% # taking our summary table
   filter(experiment == "first", # keep only Experiment 1
          condition == "Baseline", # keep only Baseline condition
          selectCode == 1) %>% # keep only correct selections
   pull(total) -> baseline.total.exp1 # pull out the value(s) from the total column


prop.test(x = baseline.success.exp1,
          n = baseline.total.exp1,
          p = 0.5) # check against "chance" levels of 50%
```

No, it is not detectably different from chance (β=0.541, χ²(1)=1.50, p=0.22).

Now, do the same test with the Treatment condition in Experiment 1.

```{r}
# prop.test( x = successTrials ,
#            n = totalTrials ,
#            p = probabilityOfSuccess )

data.pct %>% # taking our summary table
   filter(experiment == "first", # keep only Experiment 1
          condition == "Treatment", # keep only Baseline condition
          selectCode == 1) %>% # keep only correct selections
   pull(count) -> treatment.success.exp1 # pull out the value(s) from the count column

data.pct %>% # taking our summary table
   filter(experiment == "first", # keep only Experiment 1
          condition == "Treatment", # keep only Baseline condition
          selectCode == 1) %>% # keep only correct selections
   pull(total) -> treatment.total.exp1 # pull out the value(s) from the total column


prop.test(x = treatment.success.exp1,
          n = treatment.total.exp1,
          p = 0.5) # check against "chance" levels of 50%
```

Yes, it is detectably different from chance (β=0.754, χ²(1)=61.0, p<0.0001).

## GLM to compare across conditions

Because we are using binomial data, we must use a logistic regression. The logistic regress is nearly identical to a linear regression, but rather than finding a linear line of best fit, it finds a log-linear line of best fit, which [looks like an S-curve](https://verbingnouns.github.io/notebooks/reading/notebooks/20210615-linearmodels.html#23_Binomial_data) when the predictor is continuous.

Let's look at Experiment 1 only.

```{r}
model.glm <- glm(selectCode ~ 1 +
                              condition,
                 family = "binomial",
                 data = data %>% filter(experiment == "first"))
summary(model.glm)
```

Here we can see that our maximal model provides an estimate (β) for the Treatment condition as 0.9539. This is significant because the standard error (SE, or Std. Error) is smaller than the estimate (0.1981 < 0.9539), which is reflected in the test statistic, in this case a Z-value, of 4.814. This is larger than the threshold needed for significance, so the p-value is less than 0.05. In fact, the p-value written in scientific notation is is 1.48e-6 ("one point four eight times ten to the negative sixth power"), which means you would need to move the decimal point six places to the left (p = 0.00000148).

If you have a more complex model, e.g., a GLMER that also takes into account the other experiments included in this dataset, you may wish to do some model comparison.

```{r}
model.max <- glmer(selectCode ~ 1 + # outcome measure as a function of the null hypothesis
                      condition + # and condition
                      experiment + # and experiment
                      condition:experiment + # and the interaction of condition and experiment
                      (1 | subject) + # with random intercepts for subject
                      (1 | item), # and item
                   family = "binomial", # using a binomial linking function
                   data = data) # and our dataset

# * the null hypothesis is by default included in all models and 
#   can be explicitly written out as 1, which I have done to allow
#   all the other terms to be on their own lines
summary(model.max)
```

While we can see that the Treatment condition is still significantly different to the Baseline condition (β=0.96, SE=0.20, Z=4.68, p<0.0001), we can also see that the Experiments are only listed as pairwise comparisons, thus so are the interactions. In order to report them as a main effect and interaction effect rather than pairwise comparisons, we should do model comparison.

```{r}
# remove the interaction term to compare this nested depleted model to the maximal one
model.int <- glmer(selectCode ~ 1 + 
                      condition + 
                      experiment + 
                      # condition:experiment + 
                      (1 | subject) + 
                      (1 | item), 
                   family = "binomial", 
                   data = data) 
# this model is identical to our maximal model, except we have commented out
# the term of interest

anova(model.max, model.int)
```

Here, the model comparison shows us that the overall contribution of the interaction term is significant to the fit of our model (χ²(1)=103.06, p<0.0001). This means we can say there is a significant interaction between condition and experiment.

We should also check if there is a main effect of experiment:

```{r}
# remove the all terms containing the term of interest to compare this nested depleted model
model.exp <- glmer(selectCode ~ 1 + 
                      condition + 
                      # experiment + 
                      # condition:experiment + 
                      (1 | subject) + 
                      (1 | item), 
                   family = "binomial", 
                   data = data) 
# this model is identical to our maximal model, except we have commented out
# the term of interest and all other terms containing it

# since we commented out two terms, we should compare this model to the next greater, not the maximal one
anova(model.int, model.exp)
```

From this, we can see there is a main effect of experiment, regardless of pairwise comparisons (χ²(1)=190.66, p<0.0001).

Now, to understand what this means, we must plot our data. (In fact, we should have done this first...)

# Plotting results with error bars

## Tidyverse

This is tidyverse style code for producing a visualization of the data analyzed above. First, we will create a new dataset that will allow us to plot calculated estimates and confidence intervals. It is not necessary to save this as a separate dataset -- it could be piped directly into the plot. However, this way, you get a chance to see the content and structure of the dataset first.

```{r}
data.plot <- data.pct %>% # start with our SUMMARY data, not raw data and save a new copy
   # create a column that labels the select codes
   mutate(observed = case_when(selectCode == 1 ~ "correct", # 1 = correct
                               selectCode == 0 ~ "incorrect"), # 0 = incorrect
          observed = factor(observed, # make it a factor
                            levels = c("incorrect", "correct"))) %>% # order the levels
   rowwise() %>% # the next operate will be conducted for each row
   # using rowwise() is essential for applying certain base R functions such as prop.test
   # create columns for estimate, upper CI value, and lower CI value:
   mutate(estimate = prop.test(x = count, n = total, p = 0.5)$estimate, # extract estimate
          confint.lower = prop.test(x = count, n = total, p = 0.5)$conf.int[1], # extract lower CI
          confint.upper = prop.test(x = count, n = total, p = 0.5)$conf.int[2]) # extract upper CI

data.plot
```

Now, let's plot it:

```{r}
data.plot %>% 
   ggplot(aes(x = condition, # x-axis is by condition
              y = estimate, # y-axis is by calculated estimate
              fill = observed)) + # this will allow us to hide extra error bars later on*
   theme_bw() + # make the plot clean and pretty
   geom_col() + # plot the bars as stacked columns
   geom_hline(aes(yintercept=.5), # plot a line at 0.5 (50%)
              colour = "black", # make it black
              alpha=.5, # make it half-transparent
              linetype = "dashed") + # make it a dashed line
   geom_errorbar(aes(ymin = confint.lower, # define lower CI limit
                     ymax = confint.upper, # define upper CI limit
                     alpha = observed), # define transparency for error bars*
                 width = .1) + # keep the crossbars narrow
   scale_alpha_manual(values = c(0,1), guide = "none") + # *hide extra error bars for 'incorrect' values
   scale_y_continuous(labels = scales::percent_format()) + # y-axis converted from decimal to percent
   ylab("percent selected") + # relabel the y-axis
   facet_grid(~experiment) + # put each experiment in its own facet
   NULL # this line allows you comment out the previous one(s) without breaking the plot
```

From this plot, you can see that the Baseline conditions overlap with chance levels in Experiments 1 and 2 but not 3, where it is far below chance. Furthermore, you can see that Experiments 1 and 2 have higher-than-chance Treatment conditions, but not in Experiment 3. This is likely what is driving the interaction effect we observed. Now the statistical tests can be discussed with more clarity and conclusions about how outcomes in the two conditions and three experiments compare (and what this means) can be drawn.

## Base R graphics

This code builds directly off of the previous code to produce the graph shown at the bottom of this section. It uses only the base graphics library. It looks at Experiment 1 only.

### Step 1

```{r}
# Author: Lauren M Ackerman
# Available: https://lmackerman.com/r-snippets/
# Date: January 30, 2018

# store the prop.test() results for use in the graph 
err.cond.1 <- prop.test(baseline.success.exp1, baseline.total.exp1, p = 0.5)
err.cond.2 <- prop.test(treatment.success.exp1, treatment.total.exp1, p = 0.5)

# format the table as percentages rounded to 2 decimal places
numOfConds <- length(levels(as.factor(data$condition[data$experiment=="first"])))
numerator <- (table(data$selection[data$experiment=="first"], 
                    data$condition[data$experiment=="first"]))
denominator <- (length(data$selection[data$experiment=="first"])/numOfConds)
dfTable <- (numerator/denominator)*100

counts <- round(dfTable,2)
barplot(counts,
#  legend = rownames(counts),
   main=paste0("Choice of ",rownames(counts)[1]," and ",rownames(counts)[2]," in a forced-choice task"),
   ylab="Percent (%)", #xlab="condition", 
   cex.names=1, names=c(colnames(counts)[1], colnames(counts)[2]),
   col=c("lightseagreen","salmon"), border=c("black","black")
   )
```

### Step 2

```{r}
barplot(counts,
   main=paste0("Choice of ",rownames(counts)[1]," and ",rownames(counts)[2]," in a forced-choice task"),
   ylab="Percent (%)",  
   cex.names=1, names=c(colnames(counts)[1], colnames(counts)[2]),
   col=c("lightseagreen","salmon"), border=c("black","black")
   )
# label the stacked sections of the bar plot with the estimated percents
mtext(paste0(rownames(counts)[1]," = ", counts[1],"%"), side=1,line=-1.5,at=0.7,col="black",cex=1)
mtext(paste0(rownames(counts)[1]," = ", counts[3],"%"), side=1,line=-1.5,at=1.9,col="black",cex=1)
mtext(paste0(rownames(counts)[2]," = ", counts[2],"%"), side=3,line=-1.5,at=0.7,col="black",cex=1)
mtext(paste0(rownames(counts)[2]," = ", counts[4],"%"), side=3,line=-1.5,at=1.9,col="black",cex=1)
```

### Step 3

```{r}
barplot(counts,
#  legend = rownames(counts),
   main=paste0("Choice of ",rownames(counts)[1]," and ",rownames(counts)[2]," in a forced-choice task"),
   ylab="Percent (%)", #xlab="condition", 
   cex.names=1, names=c(colnames(counts)[1], colnames(counts)[2]),
   col=c("lightseagreen","salmon"), border=c("black","black")
   )
# label the stacked sections of the bar plot with the estimated percents
mtext(paste0(rownames(counts)[1]," = ", counts[1],"%"), side=1,line=-1.5,at=0.7,col="black",cex=1)
mtext(paste0(rownames(counts)[1]," = ", counts[3],"%"), side=1,line=-1.5,at=1.9,col="black",cex=1)
mtext(paste0(rownames(counts)[2]," = ", counts[2],"%"), side=3,line=-1.5,at=0.7,col="black",cex=1)
mtext(paste0(rownames(counts)[2]," = ", counts[4],"%"), side=3,line=-1.5,at=1.9,col="black",cex=1)

# add the confidence intervals based on the prop.test() results
arrows(0.7,counts[1],0.7,(unlist(err.cond.1[6])[1])*100, angle=90,col="black",lwd=1)
arrows(0.7,counts[1],0.7,(unlist(err.cond.1[6])[2])*100, angle=90,col="black",lwd=1)
arrows(1.9,counts[3],1.9,(unlist(err.cond.2[6])[1])*100, angle=90,col="black",lwd=1)
arrows(1.9,counts[3],1.9,(unlist(err.cond.2[6])[2])*100, angle=90,col="black",lwd=1)

```

**NOTE**: Sometimes, `1-unlist()` should be used because the count of "1" answers is on the top of this graph, alphabetically. If your error bars are misaligned, you may need to add  `1-` to these four lines of code.

Text size (`cex`) and line width of the error bars (`lwd`) are both set to size 1 in the above code, but should be increased for use on posters. The colors selected here are a close match to the default aesthetics of ggplot2, but are easily changed.