Svanik Sharma's Website

Credible Intervals versus Confidence Intervals

July 16, 2024

Suppose you have data Xiμ,σ2N(μ,σ2)X_i | \mu, \sigma^2 \sim N(\mu, \sigma^2), for i=1,...,ni = 1, ..., n, i.e, nn normally distributed data points with mean μ\mu and variance σ2\sigma^2 (or standard deviation σ\sigma). Assume you don't know σ2\sigma^2, which is usually the case anyway. You could estimate μXˉ\mu \approx \bar{X} and call it a day, but you probably want some measure of uncertainty, i.e, an interval estimate.

Note: Throughout this article, I use the convention that uppercase variables are random variables and lowercase variables are fixed (observed) quantities.

Confidence Intervals

This is the typical approach taught in any statistics class. However, its interpretation is nuanced. We want to estimate μ\mu and find some interval in which it resides. That is, we want to find: P(LμU)=1αP(L \le \mu \le U) = 1 - \alpha, where α\alpha is usually some small quantity (e.g, 0.05). In this case, LL and UU are functions of the sample, which itself can be thought of as a random vector X=(X1,X2,...,Xn)\boldsymbol{X} = (X_1, X_2, ..., X_n). Then, taking α=0.05\alpha = 0.05, we want to calculate P(LμU)=0.95P(L \le \mu \le U) = 0.95. This is a 95% confidence interval for μ\mu. It tells us that the probability the interval (L,U)(L, U) will cover the true population mean μ\mu is 95%. That is, if we were to take 100 different samples and (somehow) calculate 100 intervals (L,U)(L, U), 95 of those intervals would contain the population mean but 5 of them would not. To make this more concrete, let's calculate LL and UU for the sample we proposed in the beginning. We can use the fact that XˉμSn\frac{\bar{X} - \mu}{\frac{S}{\sqrt{n}}} is distributed as a tt distribution with n1n-1 degrees of freedom (which we denote tn1t_{n-1}). Since the tt-distribution is unimodal and symmetric, then the 97.5% and 2.5% quantiles are negatives of each other, i.e, tn1,α/2=tn1,1α/2t_{n-1, \alpha/2} = -t_{n-1, 1 - \alpha/2}, and:

P(tn1,1α/2XˉμSntn1,1α/2)=0.95P(-t_{n-1, 1-\alpha/2} \le \frac{\bar{X} - \mu}{\frac{S}{\sqrt{n}}} \le t_{n-1, 1-\alpha/2}) = 0.95

Hence,

P(Xˉtn1,1α/2SnμXˉ+tn1,1α/2Sn)=0.95P(\bar{X} - t_{n-1, 1-\alpha/2} \frac{S}{\sqrt{n}} \le \mu \le \bar{X} + t_{n-1, 1-\alpha/2} \frac{S}{\sqrt{n}}) = 0.95

And so the confidence interval is given by L=Xˉtn1,1α/2SnL = \bar{X} - t_{n-1, 1-\alpha/2} \frac{S}{\sqrt{n}} and U=Xˉ+tn1,1α/2SnU = \bar{X} + t_{n-1, 1-\alpha/2} \frac{S}{\sqrt{n}}.

Notice that LL and UU are functions of Xˉ\bar{X} and SS, the sample mean and sample standard deviation. Hence, given any particular sample, we cannot know whether the interval we compute upon observing a sample does actually contain the true population mean.

Credible Intervals

This is a Bayesian approach. We first set up a joint prior distribution for μ\mu and σ\sigma. One choice is to use a vague, noninformative prior distribution: p(μ,σ2)σ2p(\mu, \sigma^2) \propto \sigma^{-2}. The justification for this is hard to explicate succinctly here. See Gelman, particularly Section 2.5. There are many other reasonable choices, but I'll use this one since I've given no particular information about μ\mu nor σ\sigma. Also, though the prior is unnormalized (doesn't integrate to 1), it can still be used since the posterior distribution is a probability density. Then, \begin{align} p(\mu, \sigma^2 | \boldsymbol{X}) &\propto p(\boldsymbol{X} | \mu, \sigma^2) p(\mu, \sigma^2) \\ &\propto \prod_{i=1}^n p(X_i | \mu, \sigma^2) p(\mu, \sigma^2) \\ &\propto \prod_{i=1}^n N(X_i | \mu, \sigma^2) \sigma^{-2} \end{align}

If we integrate the last product with respect to σ2\sigma^2, then we get the distribution of p(μX)p(\mu | \boldsymbol{X}). The integration step is slightly involved, but we get μXtn1(Xˉ,S2n)\mu | \boldsymbol{X} \sim t_{n-1}(\bar{X}, \frac{S^2}{n}). This is called a posterior distribution. Now, to compute an interval estimate from μ\mu, we will draw SS points from μX\mu | \boldsymbol{X}, then find the 2.5% and 97.5% sample quantiles and this will be our interval. Generally, we can express this as follows:

P(lμuX=x)=l(x)u(x)p(μX=x)dμ=1αP(l \le \mu \le u | \boldsymbol{X} = \boldsymbol{x}) = \int_{l(\boldsymbol{x})}^{u(\boldsymbol{x})} p(\mu | \boldsymbol{X} = \boldsymbol{x}) d\mu = 1 - \alpha

This looks similar to the confidence interval. But the difference is that we are calculating the probability of μ\mu being covered by (l,u)(l, u) given an observed sample x\boldsymbol{x}. In other words, the probability μ\mu is contained in (l,u)(l, u) is 95%. This is less confusing than the confidence interval. Notice that there are two parts to this:

  1. Take our original nn normally-distributed sample points and compute the posterior probability distribution μXtn1(Xˉ,S2n)\mu | \boldsymbol{X} \sim t_{n-1}(\bar{X}, \frac{S^2}{n}).
  2. Draw a random sample of SS points from the posterior μX\mu | \boldsymbol{X}, sort them, and use the 2.5% and 97.5% points to compute a posterior probability interval (credible interval). In other words, the original sample is used to specify the posterior distribution of μ\mu and then a new sample is simulated from the posterior distribution of μ\mu to compute the credible interval. This is why we can say there's a 95% probability μ\mu lies in (l,u)(l, u): ll and uu are the result of a simulation, and they should contain 95% of the SS values between them.

While the interpretation of credible intervals is easier to understand, there's no guarantee they will produce the same results as a confidence interval. That being said, they usually get pretty close. For our purposes, they are almost the same:

pop_mean <- 5.5
pop_sd <- 3
n <- 1000
set.seed(14151)
sample <- rnorm(n, mean = pop_mean, sd = pop_sd)

# Confidence interval

alpha <-  0.05
sample_mean <- mean(sample)
sample_sd <- sd(sample)
crit <- qt(alpha / 2, df = n - 1)
upper <- sample_mean + crit * sample_sd / sqrt(n)
lower <- sample_mean - crit * sample_sd / sqrt(n)
print(paste0(
  upper,
  ", ",
  lower
))

# Credible interval

set.seed(14560)
v = n - 1
simul <- rt(1000, df = v)
simul <- simul / sqrt(v / (v - 2))
simul <- simul * sample_sd / sqrt(n)
simul <- simul + sample_mean
interval <- sort(simul)[c(25, 976)]
print(interval)

For the confidence interval, we get (5.229,5.588)(5.229, 5.588). For the credible interval, we get (5.232,5.585)(5.232, 5.585). Also, in this artificial example, both intervals contain the true population mean.

Conclusion

Ultimately, the distinction between the two is quite important. It's easy to think a 90% confidence interval has a 90% probability of containing the mean, but this is incorrect:

  1. A pp% confidence interval for a parameter means that if we were to generate 100 (or 1000 or 10000) samples and compute the interval using some procedure like the one described above, then pp (or 10p10p or 100p100p) of those intervals would cover the true population parameter.
  2. A pp% credible interval for a parameter means that there is a pp% chance that the interval computed from the posterior distribution of the parameter will contain the true population parameter.

References