From dccca2a020fe7a21e1740a4a6178bcec94259d2c Mon Sep 17 00:00:00 2001
From: John Stachurski <john.stachurski@gmail.com>
Date: Fri, 19 Jun 2026 07:31:21 +1000
Subject: [PATCH 1/2] Make the posterior-concentration argument visible and
 concrete
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two fixes to the "Why the posterior concentrates" section of prob_meaning:

- The opening referred to "the patterns above", but those plots lived
  inside the pm_ex3 solution dropdown (collapsed by default). Move the
  mean/std figure out of the solution into this always-visible section
  and reword the opening so it stands on its own.

- The section gave the generic beta mean/variance in terms of (alpha,
  beta) — which are the *prior* parameters, so it actually displayed the
  prior's moments, not the posterior's. Substitute the posterior
  parameters (alpha+k, beta+n-k) and take the limit explicitly: the mean
  -> 0.4 since k/n -> 0.4, and the variance ~ theta(1-theta)/n -> 0. Now
  the stated convergence is shown, not merely asserted.

Verified with jupytext export + headless execution.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lectures/prob_meaning.md | 66 +++++++++++++++++++++++++---------------
 1 file changed, 41 insertions(+), 25 deletions(-)

diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md
index 0914ad45b..82a4054ff 100644
--- a/lectures/prob_meaning.md
+++ b/lectures/prob_meaning.md
@@ -551,9 +551,48 @@ plt.show()
 
 As $n$ increases, we can see that the probability density functions _concentrate_ on $0.4$, the true value of $\theta$.
 
-Here the  posterior mean  converges to $0.4$ while the posterior standard deviation converges to $0$ from above.
+```{solution-end}
+```
+
+### Why the posterior concentrates
+
+Why does the posterior pile up ever more tightly around the true value $\theta = 0.4$ as the sample grows?
+
+The answer is encoded in the posterior we derived.
+
+Recall that after observing $k$ heads in $n$ flips, the posterior is $\textrm{Beta}(\alpha + k, \, \beta + n - k)$.
+
+A beta distribution with parameters $a$ and $b$ has
+
+* mean $\dfrac{a}{a + b}$,
+
+* variance $\dfrac{a\, b}{(a + b)^2\, (a + b + 1)}$.
+
+Substituting the *posterior* parameters $a = \alpha + k$ and $b = \beta + n - k$, so that $a + b = \alpha + \beta + n$, gives
+
+$$
+\mathbb{E}[\theta \mid k] = \frac{\alpha + k}{\alpha + \beta + n},
+\qquad
+\operatorname{Var}[\theta \mid k] = \frac{(\alpha + k)(\beta + n - k)}{(\alpha + \beta + n)^2\, (\alpha + \beta + n + 1)} .
+$$
+
+As $n$ grows, the fixed prior counts $\alpha$ and $\beta$ become negligible beside the data.
+
+Since the data are generated with $\theta = 0.4$, the Law of Large Numbers gives $k/n \to 0.4$ (see {ref}`pm_ex1`), so the posterior mean
+
+$$
+\frac{\alpha + k}{\alpha + \beta + n} \;\approx\; \frac{k}{n} \;\to\; 0.4 .
+$$
 
-To show this, we compute the mean and standard deviation of the posterior distributions.
+In the variance, the numerator grows like $n^2$ while the denominator grows like $n^3$, so
+
+$$
+\operatorname{Var}[\theta \mid k] \;\approx\; \frac{\theta(1 - \theta)}{n} \;\longrightarrow\; 0 .
+$$
+
+The posterior mean therefore homes in on the truth while its spread vanishes at rate $1/n$.
+
+The next figure confirms both claims: the posterior mean settles on $0.4$ and the standard deviation decays toward zero.
 
 ```{code-cell} ipython3
 mean_list = [post.mean() for post in posterior_list]
@@ -578,29 +617,6 @@ ax[1].set_xlabel('number of observations', fontsize=11)
 plt.show()
 ```
 
-```{solution-end}
-```
-
-### Why the posterior concentrates
-
-How shall we interpret the patterns above?
-
-The answer is encoded in the Bayesian updating formula derived above.
-
-Recall that after observing $k$ heads in $n$ flips, the posterior is $\textrm{Beta}(\alpha + k, \, \beta + n - k)$.
-
-A beta distribution with parameters $\alpha$ and $\beta$ has
-
-* mean $\frac{\alpha}{\alpha + \beta}$
-
-* variance $\frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$
-
-Here $\alpha + k$ can be viewed as the number of successes (prior pseudo-count plus observed heads) and $\beta + n - k$ as the number of failures.
-
-Since the data are generated with $\theta = 0.4$, the Law of Large Numbers tells us that, as $n$ grows, $k/n \to 0.4$ (see {ref}`pm_ex1`).
-
-Consequently, the posterior mean converges to $0.4$ and the posterior variance shrinks to zero.
-
 ```{code-cell} ipython3
 upper_bound = [post.ppf(0.95) for post in posterior_list]
 lower_bound = [post.ppf(0.05) for post in posterior_list]

From 3b1ce3aa9cce3e59062251367751cadf098e301c Mon Sep 17 00:00:00 2001
From: John Stachurski <john.stachurski@gmail.com>
Date: Fri, 19 Jun 2026 08:16:46 +1000
Subject: [PATCH 2/2] Add pointer, ground the analysis section, box-and-whisker
 coverage plot

- Add a one-line pointer from the pm_ex3 part-f solution to the
  "Why the posterior concentrates" section.
- Open that section with a back-reference to the solution of Exercise 3
  ("In the solution to pm_ex3 we watched ..."), since the concentration
  it discusses is shown in a solution box that is collapsed by default.
- Introduce the coverage-interval plot with a lead-in sentence (it
  previously followed the mean/std figure with none).
- Replace the 5th/95th-quantile scatter with a box-and-whisker plot
  (median, IQR, 5-95% whiskers) built from the analytical posterior
  quantiles, with the true theta marked.

Verified with jupytext export + headless execution.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 lectures/prob_meaning.md | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md
index 82a4054ff..0e4930046 100644
--- a/lectures/prob_meaning.md
+++ b/lectures/prob_meaning.md
@@ -551,12 +551,14 @@ plt.show()
 
 As $n$ increases, we can see that the probability density functions _concentrate_ on $0.4$, the true value of $\theta$.
 
+The next section explains *why* this concentration occurs and how fast it happens.
+
 ```{solution-end}
 ```
 
 ### Why the posterior concentrates
 
-Why does the posterior pile up ever more tightly around the true value $\theta = 0.4$ as the sample grows?
+In the solution to {ref}`pm_ex3` we watched the posterior distribution concentrate ever more tightly around the true value $\theta = 0.4$ as the sample grew. Why does this happen?
 
 The answer is encoded in the posterior we derived.
 
@@ -617,22 +619,26 @@ ax[1].set_xlabel('number of observations', fontsize=11)
 plt.show()
 ```
 
+We can also display the Bayesian coverage intervals directly.
+
+The box-and-whisker plot below summarizes each posterior by its median (central line), interquartile range (box), and $5$th–$95$th percentile range (whiskers), with the true value $\theta = 0.4$ marked.
+
 ```{code-cell} ipython3
-upper_bound = [post.ppf(0.95) for post in posterior_list]
-lower_bound = [post.ppf(0.05) for post in posterior_list]
+quantiles = [0.05, 0.25, 0.5, 0.75, 0.95]
+box_stats = []
+for post in posterior_list:
+    lo, q1, med, q3, hi = post.ppf(quantiles)
+    box_stats.append({'med': med, 'q1': q1, 'q3': q3,
+                      'whislo': lo, 'whishi': hi, 'fliers': []})
 
 fig, ax = plt.subplots(figsize=(10, 6))
-ax.scatter(np.arange(len(upper_bound)),
-           upper_bound, label='95th quantile')
-ax.scatter(np.arange(len(lower_bound)),
-           lower_bound, label='5th quantile')
-
-ax.set_xticks(np.arange(0, len(upper_bound), 2))
-ax.set_xticklabels(n_obs_list[::2])
+ax.bxp(box_stats, positions=np.arange(len(box_stats)), showfliers=False)
+ax.axhline(0.4, color='C1', linestyle='--', label=r'true $\theta = 0.4$')
+ax.set_xticks(np.arange(len(box_stats)))
+ax.set_xticklabels(n_obs_list, rotation=45)
 ax.set_xlabel('number of observations', fontsize=12)
-ax.set_title('Bayesian coverage intervals of '
-             'posterior distributions', fontsize=15)
-
+ax.set_ylabel(r'$\theta$', fontsize=12)
+ax.set_title('posterior coverage intervals as $n$ grows', fontsize=15)
 ax.legend(fontsize=11)
 plt.show()
 ```