May 14, 2026
I just wanted to write some notes down on Bellman optimality equations in detail, since some of the sources I have read don't write the details out.
For a policy \(\pi\), we will determine:
The subsequent derivations rely on iterated expectations. In particular, given random variables \(X\), \(Y\), \(Z\), we need to prove:
\begin{align*} \mathbb{E}[X|Z] = \mathbb{E}[\mathbb{E}[X|Y, Z]|Z] \end{align*}We will assume these variables are continuous (in the discrete case replace integrals with sums):
\begin{align*} \mathbb{E}[X|Z] &= \int x p(x|z) dx \\ &= \int x \int p(x, y | z) dy dx & \text{Sum rule} \\ &= \int x \int p(x|y, z) p(y|z) dy dx & \text{Bayes' Theorem} \\ &= \int x \int p(x|y, z) p(y|z) dx dy & \text{Fubini's Theorem} \\ &= \int p(y|z) \int x p(x|y, z) dx dy \\ &= \mathbb{E}[\mathbb{E}[X|Y, Z]|Z] \end{align*}