Jabir Hussain
Week 7
- Probability on function spaces
- Mean elements, covariance and cross-covariance operators
- Basis expansions with random coefficients
- Generalised polynomial chaos (gPC) expansions
- Handout: Gaussian and other stochastic processes
Random elements of Hilbert spaces
Given a Hilbert space $H$ (e.g. a Hilbert space of real-valued functions on a set $X$), an $H$-valued random variable is just a (measurable) function $u:\Omega\to H$ defined on a probability space $(\Omega,\mathcal{F},\mathbb{P})$.
Some obvious questions:
- What are the right notions of mean and variance for $u$?
- How do we build/sample such $H$-valued random variables?
Mean elements
Observe that, for an $\mathbb{R}^n$-valued random variable $u:\Omega\to\mathbb{R}^n$, the mean element $m_u\in\mathbb{R}^n$ can be defined as the vector of means of the $n$ components, and also equivalently
\[\mathbb{E}[u]=m_u \iff \mathbb{E}[u-m_u]=0 \iff \text{for all }\ell\in\mathbb{R}^n,\ \mathbb{E}[\langle \ell,\,u-m_u\rangle]=0,\]where $\langle \ell,\,u-m_u\rangle$ is a real r.v.
Inspired by this, we say that $m_u\in H$ is the (weak) expected value / mean of $u:\Omega\to H$, and write $\mathbb{E}[u]=m_u$, if
\[\text{for all }\ell\in H,\quad \mathbb{E}[\langle \ell,\,u-m_u\rangle_H]=0.\](Note) cf. the strong/Bochner integral, in which we really do perform vector-valued integration.
Covariance operators
| The mean is well-defined if $\mathbb{E}[ | u | _H]<\infty$. |
| If $\mathbb{E}[ | u | _H^2]<\infty$, then we can go further and define the covariance operator $\mathrm{Cov}[u]$, or $C_u$, of $u:\Omega\to H$ as the linear operator from $H$ to itself given by |
Note, though, that
\[\langle h,\,u-m_u\rangle_H\ \langle u-m_u,\,k\rangle_H = \langle h,\ \langle u-m_u,\,k\rangle_H (u-m_u)\rangle_H,\]and so (weakly)
\[C_u=\mathbb{E}\big[(u-m_u)\otimes(u-m_u)\big].\]Similarly, we can define the cross-covariance operator of two random variables. Let $u:\Omega\to H_1$, $v:\Omega\to H_2$ be random variables, i.e. $(u,v)$ takes values in $H_1\oplus H_2$. The cross-covariance operator of $u$ and $v$, denoted $\mathrm{Cov}[u,v]$ or $C_{uv}$, is an operator from $H_2$ into $H_1$ given by
\[\langle h,\,C_{uv} k\rangle_{H_1} := \mathbb{E}\big[\langle h,\,u-m_u\rangle_{H_1}\ \langle v-m_v,\,k\rangle_{H_2}\big],\]for $k\in H_2$, $h\in H_1$.
Note that $C_u$ above is just $C_{uu}$. Just as before,
\[C_{uv}=\mathbb{E}\big[(v-m_v)\otimes(u-m_u)\big].\]Sazonov’s theorem (structure of covariance operators)
| Theorem (Sazonov). Let $u:\Omega\to H$ take values in a separable Hilbert space (i.e. $\dim H$ is finite or countably infinite), have finite second moment $\mathbb{E}[ | u | _H^2]$, mean $m_u\in H$ and covariance operator $C_u:H\to H$. Then $C_u$ is SPSD and trace class, and |
Proof (sketch pieces). To see that $C_u$ is self-adjoint, consider any $h,k\in H$:
\[\langle h,\,C_u k\rangle = \mathbb{E}\big[\langle h,\,u-m_u\rangle\ \langle u-m_u,\,k\rangle\big] = \mathbb{E}\big[\langle k,\,u-m_u\rangle\ \langle u-m_u,\,h\rangle\big] = \langle k,\,C_u h\rangle,\]i.e. $C_u=C_u^*$.
(Exercise) Show that $C_{uv}=C_{vu}^*$.
For positivity, let $h\in H$. Then
\[\langle h,\,C_u h\rangle = \mathbb{E}\big[\langle h,\,u-m_u\rangle\ \langle u-m_u,\,h\rangle\big] = \mathbb{E}\big[|\langle h,\,u-m_u\rangle|^2\big]\ge 0.\](Exercise) $C_u$ is SPD iff there is no proper subspace $S$ of $H$ with $u\in S$ almost surely.
To see that $C_u$ has finite trace, let $(\psi_n)_{n\in\mathbb{N}}$ be any CONB of $H$. Then
\[\mathrm{tr}\,C_u = \sum_{n\in\mathbb{N}} \langle \psi_n,\,C_u\psi_n\rangle = \sum_{n\in\mathbb{N}}\mathbb{E}\big[|\langle \psi_n,\,u-m_u\rangle|^2\big] = \mathbb{E}\Big[\sum_{n\in\mathbb{N}}|\langle \psi_n,\,u-m_u\rangle|^2\Big]\](Fubini)
\[= \mathbb{E}\big[|u-m_u|^2\big]\](Parseval)
\[\le 2|m_u|^2 + 2\mathbb{E}\big[|u|^2\big]<\infty.\]$\square$
An important consequence of Sazonov’s theorem is that a finite-variance random vector $u$ with values in an infinite-dimensional $H$ can never have identity covariance.
Randomised basis expansions
Fix a CONB $(\psi_n)_{n\in\mathbb{N}}$ of a (real) Hilbert space $H$, e.g. a space of functions on some set $X$. We propose to define a random element $u$ of $H$ via
\[u := \sum_{n\in\mathbb{N}} \xi_n \psi_n,\]where the $\xi_n$, $n\in\mathbb{N}$, are $\mathbb{R}$-valued random variables.
Question: What conditions need to be imposed on the $\xi_n$ to ensure that $u\in H$ a.s.?
The key tool here is Kolmogorov’s two-series theorem from last week’s handout.
Theorem. Let $(\psi_n)_n$ be a CONB of $H$. Fix a deterministic sequence $(\sigma_n)_n$ with $\sigma_n\in\mathbb{R}$, and set
\[u := \sum_{n\in\mathbb{N}} \sigma_n \xi_n \psi_n, \quad\text{with}\quad \xi_n\sim\mathcal{N}(0,1)\ \text{i.i.d.}\]| If $\sum_n \sigma_n^2<\infty$, then $u\in H$ a.s. Moreover $\mathbb{E}[u]=m_u=0$, and $\mathrm{Cov}[u]=C_u=\Sigma^2$, where $\Sigma^2$ is the diagonal operator $\psi_n\mapsto | \sigma_n | ^2\psi_n$. |
Proof. Observe that
\[u\in H \iff |u|_H^2<\infty \iff \sum_{n\in\mathbb{N}}|\sigma_n\xi_n|^2<\infty\](by Parseval).
Now observe that
\[\sum_n \mathbb{E}[|\sigma_n\xi_n|^2] = \sum_n |\sigma_n|^2\mathbb{E}[|\xi_n|^2] = \sum_n |\sigma_n|^2<\infty,\]and
\[\sum_n \mathbb{E}[|\sigma_n\xi_n|^4] = \sum_n |\sigma_n|^4\mathbb{E}[|\xi_n|^4] = 3\sum_n |\sigma_n|^4<\infty.\]| So Kolmogorov’s two-series theorem ensures that $\mathbb{P}[\sum_n | \sigma_n\xi_n | ^2<\infty]=1$, i.e. $\mathbb{P}[u\in H]=1$. |
(Exercises) Check that $\mathbb{E}[u]=0$ (easy!). Check that $\mathrm{Cov}[u]=\Sigma^2$.
$\square$
Example. On $H:=L^2_{\mathrm{per}}([0,2\pi];\mathbb{C})$ with Fourier CONB $\psi_n(t)=\exp(int)/\sqrt{2\pi}$ for $n\in\mathbb{Z}$, we can build a random element $u$ of $H$ by setting e.g.
\[u=\sum_{n\in\mathbb{Z}}\sigma_n\xi_n\psi_n, \qquad \sigma_0:=0, \qquad \sigma_{\pm n}:=\frac{1}{n}\ \text{for }n\ne 0.\]Then
\[\sum_{n\in\mathbb{Z}}\sigma_n^2 = 2\sum_{n\in\mathbb{N}}\frac{1}{n^2} = \frac{\pi^2}{3}<\infty,\]and so $u\in L^2$ a.s.
Indeed, one can similarly check that $u\in H^s$ for all $s<\tfrac{1}{2}$. The mean is the zero function, and the covariance operator is $\psi_n\mapsto n^{-2}\psi_n$, i.e. $\mathrm{Cov}[u]=(-\Delta)^{-1}$, i.e. is the integral operator whose kernel is the Green’s function (fundamental solution) of Laplace’s equation.
Gaussian white noise
A naive definition of Gaussian white noise on $H$ is
\[w := \sum_{n\in\mathbb{N}} \xi_n \psi_n, \qquad \xi_n\sim\mathcal{N}(0,1)\ \text{i.i.d.}\]| Unfortunately, it is easy to show that $\mathbb{P}[ | w | _H=\infty]=1$, so $w$ is almost surely too rough to land in $H$. Its mean is zero and its covariance operator is the identity; it has infinite second moment. Similar proof techniques to the above use Kolmogorov’s theorem to show that $w\in H^s$ a.s. for all $s<-\tfrac{1}{2}$. |
Generalised polynomial chaos (gPC) expansions
| The idea here is to express a “complicated” r.v. $u$ as a sum of “simple” deterministic functions of a “simple” r.v. $\xi$, called the stochastic germ. To keep things concrete, fix a r.v. $\xi:\Omega\to X=\mathbb{R}$ with law $\mu_\xi$. Under mild assumptions (need that $\mathbb{E}[e^{a | \xi | }]<\infty$ for some $a>0$) there is a system ${q_k}{k\in\mathbb{N}_0}$ of orthogonal polynomials for $\xi/\mu\xi$: |
- each $q_k:\mathbb{R}\to\mathbb{R}$ is a polynomial of degree $k$;
- they satisfy the orthogonality relation
for some normalisation constants $\gamma_k>0$;
- the functions $q_k$ form a CONB for $L^2(X,\mu_\xi;\mathbb{R})$.
E.g.:
- For $\xi\sim\mathcal{N}(0,1)$, we obtain the Hermite polynomials
The normalisation constant is $\gamma_k=k!$.
- For $\xi\sim U([-1,1])$, we obtain the Legendre polynomials.
The idea now is simply to expand $u$ (thought of as a function on $X$) in the CONB ${q_k}_{k}$:
\[u = \sum_{k\in\mathbb{N}_0} u_k\,q_k(\xi),\]with
\[u_k=\gamma_k^{-1}\mathbb{E}[u\,q_k(\xi)]\in\mathbb{R}.\]In practice, the expected value on the RHS will have to be approximated somehow, e.g. via Monte Carlo sampling:
\[u_k \approx \hat u_k^{(N)} := \frac{1}{N}\sum_{n=1}^N u(\xi^{(n)})\,q_k(\xi^{(n)}), \qquad \xi^{(n)}\sim \mu_\xi\ \text{i.i.d.}\]Think of this as a cheap surrogate model for the potentially expensive/slow/complicated r.v. $u$.
The same procedure can be followed for any system of functions $q_k$ that are orthogonal w.r.t. $\mu_\xi$, not just polynomials, e.g. wavelets, Fourier basis functions, …
We can do the same thing for gPC expansion of a random vector $u:\Omega\to\mathbb{R}^d$. Again we expand as
\[u = \sum_k u_k\,q_k(\xi), \qquad u_k=\gamma_k^{-1}\mathbb{E}[u\,q_k(\xi)]\in\mathbb{R}^d.\]We can do the same thing for gPC expansion of a random function $u:\Omega\to{\text{functions on }T}$, i.e. stochastic processes/functions:
\[u(t,\xi) = \sum_k u_k(t)\,q_k(\xi),\]with $u_k:T\to\mathbb{R}$ given by
\[u_k(t)=\gamma_k^{-1}\mathbb{E}[u(t,\xi)\,q_k(\xi)].\](the “stochastic modes” or gPC modes of $u$)
We also often expand using multiple stochastic germs $\xi_1,\ldots,\xi_M$. We then need a system of functions $\mathbb{R}^M\to\mathbb{R}$ that form a CONB with respect to the joint law $\mu_\xi$ of $\xi=(\xi_1,\ldots,\xi_M)$. If the components of $\xi$ are independent, then joint basis functions are just products of univariate ones — but beware the curse of dimension: learning $K^M$ coefficients can rapidly exhaust your computational resources!
Stochastic processes
A stochastic process on a set $X$ is just a (measurable) function
\[u:X\times\Omega\to\mathbb{R},\]where $(\Omega,\mathcal{F},\mathbb{P})$ is some probability space, i.e. it is just a family of $\mathbb{R}$-valued random variables $u(x)$, one for each $x\in X$.
Mean and covariance functions
The mean function $m_u:X\to\mathbb{R}$ of $u$ is given by
\[m_u(x):=\mathbb{E}[u(x)] = \int_\Omega u(x,\omega)\,\mathbb{P}(d\omega),\]| defined as long as $\mathbb{E}[ | u(x) | ]$ is finite for all $x\in X$. |
The covariance function / cov. kernel $k_u:X\times X\to\mathbb{R}$ is given by
\[k_u(x,y):=\mathbb{E}\big[(u(x)-m_u(x))(u(y)-m_u(y))\big],\]| and is defined as long as $\mathbb{E}[ | u(x) | ^2]<\infty$ for all $x\in X$. |
Gaussian processes
A stochastic process $u:X\times\Omega\to\mathbb{R}$ is called a Gaussian process (GP) with mean func. $m:X\to\mathbb{R}$ and cov. func. $k:X\times X\to\mathbb{R}$, denoted $u\sim\mathrm{GP}(m,k)$, if, for all $x_1,\ldots,x_n\in X$, $n\in\mathbb{N}$,
\[\begin{pmatrix} u(x_1)\\ \vdots\\ u(x_n) \end{pmatrix} \sim \mathcal{N}\!\left( \begin{pmatrix} m(x_1)\\ \vdots\\ m(x_n) \end{pmatrix}, \begin{pmatrix} k(x_1,x_1) & \cdots & k(x_1,x_n)\\ \vdots & \ddots & \vdots\\ k(x_n,x_1) & \cdots & k(x_n,x_n) \end{pmatrix} \right).\]Examples
Some commonly-used covariance functions include:
-
Brownian kernel: $k(x,y):=\min(x,y)$ on $X=[0,T]$
-
Squared exp. kernel: for $X=\mathbb{R}^d$,
- Exp. kernel:
- Matérn kernels
Karhunen–Loève theorem
There is a close relationship between “nice” stochastic processes and random basis expansions in the eigenbasis of a suitable integral operator.
Karhunen–Loève Theorem. Let $X\subseteq\mathbb{R}^d$ and let $u:X\times\Omega\to\mathbb{R}$ be centred (i.e. $m_u=0$) and square-integrable (i.e.
\[\mathbb{E}\!\left[\int_X |u(x)|^2\,dx\right] = \int_X \mathbb{E}[|u(x)|^2]\,dx < \infty\]), with continuous and square-integrable cov. func. $k$.
Then the induced integral op.
\[I_k:L^2(X)\to L^2(X), \qquad (I_k f)(x):=\int_X k(x,y)f(y)\,dy\]is the cov. op. $\mathrm{Cov}[u]$ of $u$. Let $\lambda_n\ge 0$ and $\psi_n$ be the eigenvalues and orthonormal eigenfunctions of $I_k$. Then
\[u=\sum_{n\in\mathbb{N}} z_n\,\psi_n,\]with
\[z_n=\int_X u(x)\psi_n(x)\,dx=\langle u,\psi_n\rangle_{L^2(X)},\]and the r.v.s $z_n$ are centred (i.e. $\mathbb{E}[z_n]=0$) and uncorrelated with variance $\lambda_n$ (i.e. $\mathbb{E}[z_n z_m]=\lambda_n\delta_{nm}$).
Furthermore, if $u\sim \mathrm{GP}(0,k)$, then
\[z_n\sim\mathcal{N}(0,\lambda_n).\]Smoothness of sample draws
How regular/smooth are the sample draws of a stochastic process / GP? Needs a lot of mathematical analysis.
- If $k$ has all partial derivatives up to order $2n$ continuous, then $u$ is almost surely $n$ times differentiable.
- If the RKHS $\mathcal{H}_k$ induced by the cov. kernel $k$ is a space of $s$-times differentiable functions, then $u$ is almost surely not in $\mathcal{H}_k$, but it does almost surely have $s-\tfrac{d}{2}$ derivatives, as long as $s>d$.
Proof that $\mathrm{Cov}[u]=I_k$
For $f,g\in L^2(X)$, by definition of the covariance operator,
\[\langle f,\ \mathrm{Cov}[u]\,g\rangle_{L^2} = \mathbb{E}\big[\langle f,u\rangle_{L^2}\ \langle u,g\rangle_{L^2}\big] = \mathbb{E}\left[\left(\int_X f(x)u(x)\,dx\right)\left(\int_X u(y)g(y)\,dy\right)\right].\](Fubini)
\[= \mathbb{E}\left[\int_X\int_X f(x)u(x)u(y)g(y)\,dx\,dy\right] = \int_X\int_X f(x)\,\mathbb{E}[u(x)u(y)]\,g(y)\,dy\,dx\] \[= \int_X f(x)\left(\int_X k(x,y)g(y)\,dy\right)\,dx = \int_X f(x)\,(I_k g)(x)\,dx = \langle f,\ I_k g\rangle_{L^2}.\]$\square$
At many points now, we have a necessary step in implementing our mathematical ideas: we need to learn/fit coefficients in a model to best explain some data, or to best approximate some underlying function. This naturally leads us to our next topic: inverse problems.