Simulation example for Bayesian power analysis

(The below is taken from the 20% Statistician (Daniel Laken's blog) - note the R syntax below may take a while to run e.g. 20 minutes)

[SNIP]

There are no Bayesian power calculators (yet, as far as I know). Bayes Factors express the relative evidence in your data, and you don’t need a threshold to interpret a Bayes Factor. In principle, any BF10 < 1 provides stronger support for the null hypothesis than for the alternative hypothesis. But no one will be convinced if you argue for the null hypothesis based on a BF10 = 0.98, and it’s practical to have some sort of agreed upon threshold where we all agree data can be interpreted as ‘support’ – hence tables with Bayes Factor thresholds as the one reported in Wagenmakers et al, 2011 (see above). So what are the chances of actually observing relative support for the null hypothesis, compare to an alternative hypothesis of d = 0.707, with 50 participants in each condition, when you want a BF10 < 1/3?

Using the R code below, you can easily calculate this probability yourself. It also provides the probability of finding support for the alternative hypothesis. This means you can run the simulations twice, once setting the true effect size to 0, once setting the true effect size to 0.707, and see whether your sample size is large enough to get what you want from the data you plan to collect.Note you need quite some simulations to be accurate, and this takes minutes)

Running these simulations, we see there’s about a 69% probability of finding support for the null-hypothesis with a BF10 < 1/3 if the true effect size is 0. In the histogram of Bayes Factors below, this is the distribution to the left of the left vertical red line. Approximately 1.6% of the Bayes Factors give misleading evidence (a BF > 3, to the right of the right vertical red line) and around 30% of the Bayes Factors are inconclusive evidence.

The second set of simulations reveals there’s approximately a 85% chance of finding a BF10 > 3 when the true effect size is 0.707. Based on these data, and your explicit interest in providing support for the null-hypothesis, you might want to collect some more participants.

Let’s look at some other scenarios. Imagine that in the previous situation, the true effect size was d = 0.43. It turns out that 13% of our experiments will find relative support for the null-hypothesis, 38% of the tests will find relative support for the alternative hypothesis, and 48% of the tests will be inconclusive, with a Bayes Factor between 1/3 and 3. Luckily, with Bayesian statistics you can continue collecting data, but since no one has unlimited resources, it’s important to know whether it’s feasible to collect sufficient data to answer your question in the first place. For example, in the default Bayesian t-test, don’t bother trying to find ‘strong evidence’ (BF < 1/10) for the null if you are planning to collect less than 200 participants. When the null is really true, and you plan to collect data from 100 to 200 participants in each condition, you’ll never find the evidence you are looking for. With 300 participants in each condition, you’ll start to get around 33% probability of finding what you want, indicating a threshold of 10 might be unreasonable for single studies, and more suitable for meta-analyses.

The simulations also show that the probability of an incorrect conclusion (observing support for H0 when H1 is true, or support for H1 when H0 is true) is very low. Indeed, it’s so low, that we might want to reconsider used a Bayes Factor of 3 as a threshold.

For example, if we set a threshold as low as 1.2, instead of 3, the simulations show that, when the null-hypothesis is true, approximately 5% of the Bayes Factors we observe will provide relative support for the alternative hypothesis, and over 90% of the Bayes Factors will provide relative support for the null hypothesis, with 50 participants in each condition (the remainder, 5% fall in the small inconclusive evidence area). This may not be ‘substantial evidence’ but you won’t be wrong very often, in the long run, when you decide to accept the null hypothesis over the alternative hypothesis of d = 0.707 when you collect 50 participants in each condition.

You may not like the use of a cut-off value to indicate ‘relative support’ as proposed by some Bayesians (Dienes, 2016; Wagenmakers et al., 2011). I think these cut-offs are most important when planning an experiment. It makes sense to aim for support most people will find convincing. When the data are in hand, you can interpret it any way you like. If you agree, then important questions are which cut-off values are acceptable under which circumstances? Can we develop formulas for Bayesian power analysis instead of having to rely on simulations?

I learned something by playing around with these simulations, and I expect others moving to Bayes Factors will have similar questions as I tried to examine here. Power analysis is Frequentist concept, and it might not be close to the heart of Bayesians, but I expect it meets a future need of researchers switching to Bayesian statistics, especially when they want to design studies that provide support for the null.

#Bayesian Power Analysis 
  
if(!require(BayesFactor)){install.packages('BayesFactor')} 
library(BayesFactor)  
  
D<-0.0 #Set the true effect size 
n<-50 #Set sample size of your study (number in each group) 
nSim<-100000 #Set number of simulations (it takes a while, be patient) 
rscaleBF<-sqrt(2)/2 #Set effect size of alternative hypothesis (default = sqrt(2)/2, or 0.707) 
threshold<-3 #Set threshold for 'support' - e.g., 3, 10, or 30 
  
bf<-numeric(nSim) 
 
# create progress bar because it might take a while 
pb <- winProgressBar(title = "progress bar", min = 0, max = nSim, width = 300) 
 
 
for(i in 1:nSim){ #for each simulated experiment 
   setWinProgressBar(pb, i, title=paste(round(i/nSim*100, 1), "% done")) 
     x<-rnorm(n = n, mean = 0, sd = 1) 
   y<-rnorm(n = n, mean = D, sd = 1)  
   bf[i] <- exp((ttestBF(x,y, rscale = rscaleBF))@bayesFactor$bf) 
 } 
close(pb)#close progress bar 
 
supportH0 <- sum(bf<(1/threshold))/nSim 
supportH1 <- sum(bf>threshold)/nSim 
 
cat("The probability of observing support for the null hypothesis is ",supportH0) 
cat("The probability of observing support for the alternative hypothesis is ",supportH1) 
 
 
hist(log(bf), breaks=20)