From dccca2a020fe7a21e1740a4a6178bcec94259d2c Mon Sep 17 00:00:00 2001 From: John Stachurski Date: Fri, 19 Jun 2026 07:31:21 +1000 Subject: [PATCH 1/2] Make the posterior-concentration argument visible and concrete MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two fixes to the "Why the posterior concentrates" section of prob_meaning: - The opening referred to "the patterns above", but those plots lived inside the pm_ex3 solution dropdown (collapsed by default). Move the mean/std figure out of the solution into this always-visible section and reword the opening so it stands on its own. - The section gave the generic beta mean/variance in terms of (alpha, beta) — which are the *prior* parameters, so it actually displayed the prior's moments, not the posterior's. Substitute the posterior parameters (alpha+k, beta+n-k) and take the limit explicitly: the mean -> 0.4 since k/n -> 0.4, and the variance ~ theta(1-theta)/n -> 0. Now the stated convergence is shown, not merely asserted. Verified with jupytext export + headless execution. Co-Authored-By: Claude Opus 4.8 (1M context) --- lectures/prob_meaning.md | 66 +++++++++++++++++++++++++--------------- 1 file changed, 41 insertions(+), 25 deletions(-) diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md index 0914ad45b..82a4054ff 100644 --- a/lectures/prob_meaning.md +++ b/lectures/prob_meaning.md @@ -551,9 +551,48 @@ plt.show() As $n$ increases, we can see that the probability density functions _concentrate_ on $0.4$, the true value of $\theta$. -Here the posterior mean converges to $0.4$ while the posterior standard deviation converges to $0$ from above. +```{solution-end} +``` + +### Why the posterior concentrates + +Why does the posterior pile up ever more tightly around the true value $\theta = 0.4$ as the sample grows? + +The answer is encoded in the posterior we derived. + +Recall that after observing $k$ heads in $n$ flips, the posterior is $\textrm{Beta}(\alpha + k, \, \beta + n - k)$. + +A beta distribution with parameters $a$ and $b$ has + +* mean $\dfrac{a}{a + b}$, + +* variance $\dfrac{a\, b}{(a + b)^2\, (a + b + 1)}$. + +Substituting the *posterior* parameters $a = \alpha + k$ and $b = \beta + n - k$, so that $a + b = \alpha + \beta + n$, gives + +$$ +\mathbb{E}[\theta \mid k] = \frac{\alpha + k}{\alpha + \beta + n}, +\qquad +\operatorname{Var}[\theta \mid k] = \frac{(\alpha + k)(\beta + n - k)}{(\alpha + \beta + n)^2\, (\alpha + \beta + n + 1)} . +$$ + +As $n$ grows, the fixed prior counts $\alpha$ and $\beta$ become negligible beside the data. + +Since the data are generated with $\theta = 0.4$, the Law of Large Numbers gives $k/n \to 0.4$ (see {ref}`pm_ex1`), so the posterior mean + +$$ +\frac{\alpha + k}{\alpha + \beta + n} \;\approx\; \frac{k}{n} \;\to\; 0.4 . +$$ -To show this, we compute the mean and standard deviation of the posterior distributions. +In the variance, the numerator grows like $n^2$ while the denominator grows like $n^3$, so + +$$ +\operatorname{Var}[\theta \mid k] \;\approx\; \frac{\theta(1 - \theta)}{n} \;\longrightarrow\; 0 . +$$ + +The posterior mean therefore homes in on the truth while its spread vanishes at rate $1/n$. + +The next figure confirms both claims: the posterior mean settles on $0.4$ and the standard deviation decays toward zero. ```{code-cell} ipython3 mean_list = [post.mean() for post in posterior_list] @@ -578,29 +617,6 @@ ax[1].set_xlabel('number of observations', fontsize=11) plt.show() ``` -```{solution-end} -``` - -### Why the posterior concentrates - -How shall we interpret the patterns above? - -The answer is encoded in the Bayesian updating formula derived above. - -Recall that after observing $k$ heads in $n$ flips, the posterior is $\textrm{Beta}(\alpha + k, \, \beta + n - k)$. - -A beta distribution with parameters $\alpha$ and $\beta$ has - -* mean $\frac{\alpha}{\alpha + \beta}$ - -* variance $\frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$ - -Here $\alpha + k$ can be viewed as the number of successes (prior pseudo-count plus observed heads) and $\beta + n - k$ as the number of failures. - -Since the data are generated with $\theta = 0.4$, the Law of Large Numbers tells us that, as $n$ grows, $k/n \to 0.4$ (see {ref}`pm_ex1`). - -Consequently, the posterior mean converges to $0.4$ and the posterior variance shrinks to zero. - ```{code-cell} ipython3 upper_bound = [post.ppf(0.95) for post in posterior_list] lower_bound = [post.ppf(0.05) for post in posterior_list] From 3b1ce3aa9cce3e59062251367751cadf098e301c Mon Sep 17 00:00:00 2001 From: John Stachurski Date: Fri, 19 Jun 2026 08:16:46 +1000 Subject: [PATCH 2/2] Add pointer, ground the analysis section, box-and-whisker coverage plot - Add a one-line pointer from the pm_ex3 part-f solution to the "Why the posterior concentrates" section. - Open that section with a back-reference to the solution of Exercise 3 ("In the solution to pm_ex3 we watched ..."), since the concentration it discusses is shown in a solution box that is collapsed by default. - Introduce the coverage-interval plot with a lead-in sentence (it previously followed the mean/std figure with none). - Replace the 5th/95th-quantile scatter with a box-and-whisker plot (median, IQR, 5-95% whiskers) built from the analytical posterior quantiles, with the true theta marked. Verified with jupytext export + headless execution. Co-Authored-By: Claude Opus 4.8 (1M context) --- lectures/prob_meaning.md | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md index 82a4054ff..0e4930046 100644 --- a/lectures/prob_meaning.md +++ b/lectures/prob_meaning.md @@ -551,12 +551,14 @@ plt.show() As $n$ increases, we can see that the probability density functions _concentrate_ on $0.4$, the true value of $\theta$. +The next section explains *why* this concentration occurs and how fast it happens. + ```{solution-end} ``` ### Why the posterior concentrates -Why does the posterior pile up ever more tightly around the true value $\theta = 0.4$ as the sample grows? +In the solution to {ref}`pm_ex3` we watched the posterior distribution concentrate ever more tightly around the true value $\theta = 0.4$ as the sample grew. Why does this happen? The answer is encoded in the posterior we derived. @@ -617,22 +619,26 @@ ax[1].set_xlabel('number of observations', fontsize=11) plt.show() ``` +We can also display the Bayesian coverage intervals directly. + +The box-and-whisker plot below summarizes each posterior by its median (central line), interquartile range (box), and $5$th–$95$th percentile range (whiskers), with the true value $\theta = 0.4$ marked. + ```{code-cell} ipython3 -upper_bound = [post.ppf(0.95) for post in posterior_list] -lower_bound = [post.ppf(0.05) for post in posterior_list] +quantiles = [0.05, 0.25, 0.5, 0.75, 0.95] +box_stats = [] +for post in posterior_list: + lo, q1, med, q3, hi = post.ppf(quantiles) + box_stats.append({'med': med, 'q1': q1, 'q3': q3, + 'whislo': lo, 'whishi': hi, 'fliers': []}) fig, ax = plt.subplots(figsize=(10, 6)) -ax.scatter(np.arange(len(upper_bound)), - upper_bound, label='95th quantile') -ax.scatter(np.arange(len(lower_bound)), - lower_bound, label='5th quantile') - -ax.set_xticks(np.arange(0, len(upper_bound), 2)) -ax.set_xticklabels(n_obs_list[::2]) +ax.bxp(box_stats, positions=np.arange(len(box_stats)), showfliers=False) +ax.axhline(0.4, color='C1', linestyle='--', label=r'true $\theta = 0.4$') +ax.set_xticks(np.arange(len(box_stats))) +ax.set_xticklabels(n_obs_list, rotation=45) ax.set_xlabel('number of observations', fontsize=12) -ax.set_title('Bayesian coverage intervals of ' - 'posterior distributions', fontsize=15) - +ax.set_ylabel(r'$\theta$', fontsize=12) +ax.set_title('posterior coverage intervals as $n$ grows', fontsize=15) ax.legend(fontsize=11) plt.show() ```