The central limit theorem

 

Appendix D
Complex Analysis and the Central Limit Theorem

Contents

Ā 

One of the greatest challenges in a course is determining what level to pitch it. This is perhaps most apparent in deciding what level of detail to give for proofs. For us, the most important result is, as the name suggests, the Central Limit Theorem. The purpose of this chapter is to quickly introduce you to a subject which is beautiful and important in its own right, Complex Analysis, and see how it connects to Probability and the Central Limit Theorem.

ChapterĀ 1
Complex Analysis and the Central Limit Theorem

In Chapter 20 we gave a proof of the Central Limit Theorem using generating functions; unfortunately that proof isnā€™t complete as it assumed some results from Complex Analysis. Moreover, we had to assume the moment generating function existed, which isnā€™t always true.

We tried again in Chapter 21; we proved the Central Limit Theorem by using Fourier analysis. Instead of using the moment generating function, which can fail to even exist, this time we used the Fourier transform (also called the characteristic function), which has the very nice and useful property of actually existing! Unfortunately, here too we needed to appeal to some results from Complex Analysis.

This leaves us in a quandary, where we have a few options.

1.
We can just accept as true some results from Complex Analysis and move on.
2.
We can try and ļ¬nd yet another proof, this time one that doesnā€™t need Complex Analysis.
3.
We can drop everything and take a crash course in Complex Analysis.

This chapter is for those who like the third option. Weā€™ll explain some of the key ideas of complex analysis, in particular weā€™ll show why itā€™s such a diļ¬€erent subject than real analysis. Obviously, it helps to have seen real analysis, but if youā€™re comfortable with Taylor series and basic results on convergence youā€™ll be ļ¬ne.

It turns out that assuming a function of a real variable is diļ¬€erentiable doesnā€™t mean too much, but assume a function of a complex variable is diļ¬€erentiable and all of a sudden doors are opening everywhere with additional, powerful facts that must be true. Obviously this chapter canā€™t replace an entire course, nor is that our goal. We want to show you some of the key ideas of this beautiful subject, and hopefully when you ļ¬nish reading youā€™ll have a better sense of why the black-box results from Complex Analysis (Theorems 20.5.3 and 20.5.4) are true.

This chapter is meant to supplement our discussions on moment generating functions and proofs of the Central Limit Theorem. We thus assume the reader is familiar with the notation and concepts from Chapters 19 through 21.

1.1 Warnings from real analysis

The following example is one of my favorites from real analysis. It indicates why real analysis is hard, almost surely much harder than you might expect. Consider the function \(g:ā„\to ā„\) given by\begin{cases}\tag{D.1}g(x) = e^{-1/x^2}&\text{if }xā‰ 0\\0&\text{otherwise.}\end {cases} Using the deļ¬nition of the derivative and Lā€™Hopitalā€™s rule, we can show that \(g\) is inļ¬nitely diļ¬€erentiable, and all of its derivatives at the origin vanish. For example,

\begin {eqnarray*} g'(0) & \ = \ & \lim _{h\to 0} \frac {e^{-1/h^2} - 0}{h} \nonumber \\ & = & \lim _{h\to 0} \frac {1/h}{e^{1/h^2}} \nonumber \\ &=& \lim _{k \to \infty } \frac {k}{e^{k^2}} \nonumber \\ &=& \lim _{k\to \infty } \frac {1}{2k e^{k^2}} \ = \ 0, \end {eqnarray*}

where we used Lā€™Hopitalā€™s rule in the last step (\(\lim _{k\to \infty } A(k)/B(k)\) \(=\) \(\lim _{k\to \infty }\) \(A'(k)/B'(k)\) if \(\lim _{k\to \infty } A(k)\) \(=\) \(\lim _{k\to \infty } B(k) = \infty \)). (We replaced \(h\) with \(1/k\) as this allows us to re-express the quantities above in a familiar form, one where we can apply Lā€™Hopitalā€™s rule.) A similar analysis shows that the \(n\)th derivative vanishes at the origin for all \(n\), i.e., \(g^{(n)}(0) = 0\) for all positive integer \(n\). If we consider the Taylor series for \(g\) about 0, we ļ¬nd \[ g(x) \ = \ g(0) + g'(0)x + \frac {g''(0) x^2}{2!} + \cdots \ = \ \sum _{n=0}^\infty \frac {g^{(n)}(0) x^n}{n!} \ = \ 0; \] however, clearly \(g(x) \neq 0\) if \(x \neq 0\). We are thus in the ridiculous case where the Taylor series (which converges for all \(x\)!) only agrees with the function when \(x=0\). This isnā€™t that impressive, as the Taylor series is forced to agree with the original function at 0, as both are just \(g(0)\).

We can learn a lot from the above example. The ļ¬rst is that itā€™s possible for a Taylor series to converge for all \(x\), but only agree with the function at one point! Itā€™s not too impressive to agree at just one point, as by construction the Taylor series has to agree at that point of expansion. The second, which is far more important, is that a Taylor series does not uniquely determine a function! For example, both \(\sin x\) and \(\sin x + g(x)\) (with \(g(x)\) the function from equation (D.1)) have the same Taylor series about \(x=0\).

The reason this is so important for us is that we want to understand when a moment generating function uniquely determines a probability distribution. If our distribution was discrete, there was no problem (Theorem 19.6.5). For continuous distributions, however, itā€™s much harder, as we saw in equation (19.6.5) where we met two densities that had the same moments.

Apparently, we must impose some additional conditions for continuous random variables. For discrete random variables, it was enough to know all the moments; this doesnā€™t suļ¬ƒce for continuous random variables. What should those conditions be?

Recall that if we have a random variable \(X\) with density \(f_X\), its \(k\)th moment, denoted by \(\mu _k'\), is deļ¬ned by \[ \mu _k' \ = \ \int _{-\infty }^\infty x^k f_X(x) dx. \] Letā€™s consider again the pair of functions in equation (19.6.5). A nice calculus exercise shows that \(\mu _k' = e^{k^2/2}\). This means that the moment generating function is \[ M_X(t) \ = \ \sum _{k=0}^\infty \frac {\mu _k' t^k}{k!} \ = \ \sum _{k=0}^\infty \frac {e^{k^2/2} t^k}{k!}. \] For what \(t\) does this series converge? Amazingly, this series converges only when \(t=0\)! To see this, it suļ¬ƒces to show that the terms do not tend to zero. As \(k! \le k^k\), for any ļ¬xed \(t\), for \(k\) suļ¬ƒciently large \(t^k/k! \ge (t/k)^k\); moreover, \(e^{k^2/2} = (e^{k/2})^k\), so the \(k\)th term is at least as large as \((e^{k/2} t / k)^k\). For any \(t \neq 0\), this clearly does not tend to zero, and thus the moment generating function has a radius of convergence of zero!

This leads us to the following conjecture: If the moment generating function converges for \({|t|} < \delta \) for some \(\delta > 0\), then it uniquely determines a density. Weā€™ll explore this conjecture below.

1.2 Complex Analysis and Topology Deļ¬nitions

Our purpose here is to give a ļ¬‚avor of what kind of inputs are needed to ensure that a moment generating function uniquely determines a probability density. We ļ¬rst collect some deļ¬nitions, and then state some useful results from complex analysis.

Deļ¬nitionĀ 1.2.1Ā (Complex variable, complex function) Any complex number \(z\) can be written as \(z = x + iy\), with \(x\) and \(y\) real and \(i = \sqrt {-1}\). We denote the set of all complex numbers by \(ā„‚\). A complex function is a map \(f\) from \(ā„‚\) to \(ā„‚\); in other words \(f(z) \in ā„‚\). Frequently one writes \(x = \Re (z)\) for the real part, \(y = \Im (z)\) for the imaginary part, and \(f(z) = u(x,y) + iv(x,y)\) with \(u\) and \(v\) functions from \(ā„^2\) to \(ā„\).

There are many ways to write complex numbers. The most common is the deļ¬nition above; however, a polar coordinate approach is sometimes useful. One of the most remarkable relations in all of mathematics is \begin {equation*} e^{i\theta }\ = \ \cos \theta + i \sin \theta . \end {equation*} There are several ways to see this, depending on how much math you want to assume. One way is to use the Taylor series expansions for the exponential, sine and cosine functions. This gives another way of writing complex numbers; instead of \(1 + i\) we could write \(\sqrt {2} \exp (i\pi /4)\). A particularly interesting choice of \(\theta \) is \(\pi \), which gives \(e^{i\pi } = -1\), a beautiful formula involving many of the most important constants in mathematics!

Noting \(i^2=-1\), it isnā€™t too hard to show that

\begin {eqnarray*} (a+ib) + (x+iy) & \ = \ & (a+x) + i(b+y)\nonumber \\ (a+ib) \cdot (x+iy) &=& (ax-by) + i(ay+bx). \end {eqnarray*}

The complex conjugate of \(z=x+iy\) is \(\overline {z} := x - iy\), and we deļ¬ne the absolute value (or the modulus or magnitude) of \(z\) to be \(\sqrt {z\overline {z}}\), and denote this by \(|z|\). This is real valued, and equals \(\sqrt {x^2+y^2}\). If we were to write \(z\) as a vector, it would be \(z = (x,y)\); note that in this case we see that \(|z|\) equals the length of the corresponding vector.

We can write almost anything as an example of a complex function; one possible function is \(f(z) = z^2 + |z|\). The question is when is such a function diļ¬€erentiable in \(z\), and what does that diļ¬€erentiability entail. Actually, before we answer this we ļ¬rst need to state what it means for a complex function to be diļ¬€erentiable!

Deļ¬nitionĀ 1.2.2Ā (Diļ¬€erentiable) We say a complex function \(f\) is (complex) diļ¬€erentiable at \(z_0\) if itā€™s diļ¬€erentiable with respect to the complex variable \(z\), which means \[\lim_{h \to 0} \frac {f(z_0+h) - f(z_0)}{h} \] exists, where \(h\) tends to zero along any path in the complex plane. If the limit exists we write \(f'(z_0)\) for the limit. If \(f\) is diļ¬€erentiable, then \(f(x+iy) = u(x,y)+iv(x,y)\) satisļ¬es the Cauchy-Riemann equations: \[ f'(z) \ = \ \frac {\partial u}{\partial x} + i \frac {\partial v}{\partial x} \ = \ -i \frac {\partial u}{\partial y} + \frac {\partial v}{\partial y} \] (one direction is easy, arising from sending \(h\to 0\) along the paths \(\widetilde {h}\) and \(i\widetilde {h}\), with \(\widetilde {h} \in ā„\)).


Hereā€™s a quick hint to see why diļ¬€erentiability implies the Cauchy-Riemann equations ā€“ try and ļ¬ll in the details. Since the derivative exists at \(z_0\), the key limit is independent of the path we take to the point \(x_0 + iy_0\). Consider the path \(x + iy_0\) with \(x\to x_0\), and the path \(x_0 + i y\) with \(y\to y_0\), and use results from multivariable calculus on partial derivatives.

Letā€™s explore a bit and see which functions are complex diļ¬€erentiable. We let \(h = h_1+ih_2\) below, with \(h\to 0 + 0i\). If \(f(z) = z\) then \begin {equation*} \lim_{h\to 0} \frac {f(z+h)-f(z)}{h} \ = \ \lim _{h\to 0} \frac {z+h-z}{h} \ = \ \lim _{h\to 0} 1 \ = \ 1; \end {equation*} thus the function is complex diļ¬€erentiable and the derivative is 1. If \(f(z) = z^2\) then

\begin {eqnarray*} \lim_{h\to 0} \frac {f(z+h) - f(z)}{h} & \ = \ & \lim _{h\to 0} \frac {(z+h)^2 - z^2}{h} \nonumber \\ &=& \lim _{h\to 0} \frac {z^2+2zh + h^2 - z^2}{h} \nonumber \\ &=& \lim _{h\to 0} \frac {2zh+h^2}{h} \nonumber \\ &=& \lim _{h\to 0} (2z+h) \nonumber \\ & = & \lim _{h\to 0} 2z + \lim _{h\to 0} h \nonumber \\ &=& 2z + 0 \ = \ 2z.\end {eqnarray*}

Weā€™re using the following properties of complex numbers: \(h/h = 1\) and \(2zh+h^2 = (2z+h)h\). Note how similar this is to the real valued analogue, \(f(x) = x^2\). If \(f(z) = \overline {z}\) then \begin {equation*} \lim_{h\to 0} \frac {f(z+h)-f(z)}{h} \ = \ \lim _{h\to 0} \frac {\overline {z+h} - \overline {z}}{h}. \end {equation*} Unlike the other limits, this one isnā€™t immediately clear. Letā€™s write \(z = x+iy\), \(h = h_1 + ih_2\) (and of course \(\overline {z} = x-iy\), \(\overline {h} = h_1-ih_2\)). The limit is \begin {equation*} \lim_{h\to 0} \frac {x-iy + h-ih_2 - (x - iy)}{h_1+ih_2} \ = \ \lim _{h\to 0} \frac {h_1-ih_2}{h_1+ih_2}. \end {equation*} This limit does not exist; depending on how \(h\to 0\) we obtain diļ¬€erent answers. For example, if \(h_2 = 0\) (traveling along the \(x\)-axis) the limit is just \(\lim _{h\to 0} h_1/h_1 = 1\), while if \(h_1 = 0\) (traveling along the \(y\)-axis) the limit is just \(\lim _{h\to 0} -ih_2/ih_2 = -1\). Thus this function isnā€™t complex diļ¬€erentiable anywhere, even though itā€™s a fairly straightforward function to deļ¬ne.

If we continue to argue along these lines, we ļ¬nd that a function is complex diļ¬€erentiable if the \(x\) and \(y\) dependence is in a very special form, namely everything is a function of \(z=x+iy\). In other words, we donā€™t allow our function to depend on \(\overline {z} = x - iy\). If we could depend on both, we could isolate out \(x\) (which is \(z+\overline {z}\)) and \(y\) (which is \((z-\overline {z})/i\)). We can begin to see why being complex diļ¬€erentiable once implies that weā€™re complex diļ¬€erentiable inļ¬nitely often, namely because of the very special dependence on \(x\) and \(y\). Also, in the plane thereā€™s really only two ways to approach a point: from above, or from below. In the complex plane, the situation is strikingly diļ¬€erent. There are so many ways we can move in two-dimensions, and each path must give the same answer if weā€™re to be complex diļ¬€erentiable. This is why diļ¬€erentiability means far more for a complex variable than for a real variable.

To state the needed results from Complex Analysis, we also require some terminology from Point Set Topology. In particular, many of the theorems below deal with open sets. We brieļ¬‚y review their deļ¬nition and give some examples.

Deļ¬nitionĀ 1.2.3Ā (Open set, closed set) A subset \(U\) of \(ā„‚\) is an open set if for any \(z_0 \in U\) thereā€™s a \(\delta \) such that whenever \({|z-z_0|} < \delta \) then \(z\in U\) (note \(\delta \) is allowed to depend on \(z_0\)). A set \(C\) is closed if its complement, \(ā„‚\setminus C\), is open.

The following are examples of open sets in \(ā„‚\).

1.
\(U_1 = \{z: |z| < r\}\) for any \(r > 0\). This is usually called the open ball of radius \(r\) centered at the origin.
2.
\(U_2 = \{z: \Re (z) > 0\}\). To see this is open, if \(z_0 \in U_2\) then we can write \(z_0 = x_0 + i y_0\), with \(x_0 > 0\). Letting \(\delta = x_0/2\), for \(z = x+iy\) we see that if \(|z-z_0| < \delta \) then \(|x-x_0| < x_0/2\), which implies \(x > x_0/2 > 0\); \(U_2\) is often called the open right half-plane.

For examples of closed sets, consider the following.

1.
\(C_1 = \{z: |z| \le r\}\). Note that if we take \(z_0\) to be any point on the boundary, then the ball of radius \(\delta \) centered at \(z_0\) will contain points more than \(r\) units from the origin, and thus \(C_1\) isnā€™t open. A little work shows, however, that \(C_1\) is closed (in fact, \(C_1\) is called the closed ball of radius \(r\) about the origin). We prove itā€™s closed by showing its complement is open. What we need to do is show that, given any point in the complement, thereā€™s a small ball about that point entirely contained in the complement. I urge you to draw a picture for the following argument. If \(z_0 \in ā„‚\setminus C_1\) then \(|z_0| > r\) (as otherwise it would be inside \(C_1\)). If we take \(\delta < \frac {|z_0| - r}2\) then after some algebra weā€™ll ļ¬nd that if \(|z-z_0| < \delta \) then \(z \in ā„‚\setminus C_1\). Thus \(ā„‚\setminus C_1\) is open, so \(C_1\) is closed.
2.
\(C_2 = \{z: \Re (z) \ge 0\}\). To see this set isnā€™t open, consider any \(z_0 = iy\) with \(y \in ā„\). A similar calculation as the one we did for \(U_2\) or \(C_1\) shows \(C_2\) is closed.

For a set that is neither open nor closed, consider \(S = U_1 \cup C_2\). Ā 

We now state two of the most important properties a complex function could have. One of the most important results in the subject is that these two seemingly very diļ¬€erent properties are actually equivalent!

Deļ¬nitionĀ 1.2.4Ā (Holomorphic, analytic) Let \(U\) be an open subset of \(ā„‚\), and let \(f\) be a complex function. We say \(f\) is holomorphic on \(U\) if \(f\) is diļ¬€erentiable at every point \(z \in U\), and we say \(f\) is analytic on \(U\) if \(f\) has a series expansion that converges and agrees with \(f\) on \(U\). This means that for any \(z_0 \in U\), for \(z\) close to \(z_0\) we can choose \(a_n\)ā€™s such that \[ f(z) \ = \ \sum _{n=0}^\infty a_n (z-z_0)^n. \]

As alluded to above, saying a function of a complex variable is diļ¬€erentiable turns out to imply far more than saying a function of a real variable is diļ¬€erentiable, as the following theorem shows us.

TheoremĀ 1.2.5 Let \(f\) be a complex function and \(U\) an open set. Then \(f\) is holomorphic on \(U\) if and only if \(f\) is analytic on \(U\), and the series expansion for \(f\) is its Taylor series.

The above theorem is amazing; its result seems to good to be true. Namely, as soon as we know \(f\) is diļ¬€erentiable once, itā€™s inļ¬nitely (real) diļ¬€erentiable and \(f\) agrees with its Taylor series expansion! This is very diļ¬€erent than what happens in the case of functions of a real variable. For instance, the function \begin {equation} h(x)\ =\ x^3 \sin (1/x) \tag{D.2} \end {equation} is diļ¬€erentiable once and only once at \(x=0\), and while the function \(g(x)\) from (D.1) is inļ¬nitely diļ¬€erentiable, the Taylor series expansion only agrees with \(g(x)\) at \(x=0\). Complex analysis is a very diļ¬€erent subject than real analysis!

The next theorem provides a very nice condition for when a function is identically zero. It involves the notion of a limit or accumulation point, which we deļ¬ne ļ¬rst.

Deļ¬nitionĀ 1.2.6Ā (Limit or accumulation point) We say \(z\) is a limit (or an accumulation) point of a sequence \(\{z_n\}_{n=0}^\infty \) if there exists a subsequence \(\{z_{n_k}\}_{k=0}^\infty \) converging to \(z\).

Letā€™s do some examples to clarify the deļ¬nitions.

1.
If \(z_n = 1/n\), then \(0\) is a limit point.
2.
If \(z_n = \cos (\pi n)\) then there are two limit points, namely \(1\) and \(-1\). (If \(z_n = \cos (n)\) then every point in \([-1,1]\) is a limit point of the sequence, though this is harder to show.)
3.
If \(z_n = (1 + (-1)^n)^n + 1/n\), then \(0\) is a limit point. We can see this by taking the subsequence \(\{z_1,z_3,z_5,z_7,\dots \}\); note the subsequence \(\{z_0,z_2,z_4,\dots \}\) diverges to inļ¬nity.
4.
Let \(z_n\) denote the number of distinct prime factors of \(n\). Then every positive integer is a limit point! For example, letā€™s show \(5\) is a limit point. The ļ¬rst ļ¬ve primes are 2, 3, 5, 7 and 11; consider \(N = 2 \cdot 3 \cdot 5 \cdot 7 \cdot 11 = 2310\). Consider the subsequence \(\{z_N, z_{N^2}, z_{N^3}, z_{N^4}, \dots \}\); as \(N^k\) has exactly 5 distinct prime factors for each \(k\), \(5\) is a limit point.
5.
If \(z_n = n^2\) then there are no limit points, as \(\lim _{n\to \infty } z_n = \infty \).
6.
Let \(z_0\) be any odd, positive integer, and set\[ z_{n+1} \ = \ \begin {cases} 3 z_n + 1 & \text {if $z_n$ is odd}\\ z_n/2 &\text {if $z_n$ is even.} \end {cases} \] Itā€™s conjectured that 1 is always a limit point (and if some \(z_m = 1\), then the next few terms have to be \(4, 2, 1, 4, 2, 1, 4, 2, 1, \dots \), and hence the sequence cycles). This is the famous \(3x+1\) problem. Kakutani called it a conspiracy to slow down American mathematics because of the amount of time people spent on this; Erdƶs said mathematics isnā€™t yet ready for such problems. See [Lag1,Ā Lag2,Ā Lag3] for some nice expositions, but be warned that this problem can be addictive!

Ā 

We can now state the theorem which, for us, is the most important result from Complex Analysis. Itā€™s the basis of the black box results.

TheoremĀ 1.2.7 Let \(f\) be an analytic function on an open set \(U\), with inļ¬nitely many zeros \(z_1, z_2, z_3, \dots \). If \(\lim _{n\to \infty } z_n \in U\), then \(f\) is identically zero on \(U\). In other words, if a function is zero along a sequence in \(U\) whose accumulation point is also in \(U\), then that function is identically zero in \(U\).

Note the above is very diļ¬€erent than what happens in real analysis. Consider again the function from (D.2), \[ h(x) \ = \ x^3 \sin (1/x). \] This function is continuous and diļ¬€erentiable. Itā€™s zero whenever \(x = 1/\pi n\) with \(n\) an integer. If we let \(z_n = 1/\pi n\), we see this sequence has \(0\) as a limit point, and our function is also zero at \(0\) (see Figure 1.1).

FigureĀ 1.1: Plot of \(x^3 \sin (1/x)\).

Itā€™s clear, however, that this function is not identically zero. Yet again, we see a stark diļ¬€erence between real and complex valued functions. As a nice exercise, show that \(x^3 \sin (1/x)\) is not complex diļ¬€erentiable. It will help if you recall \(e^{i\theta } = \cos \theta + i\sin \theta \), or \(\sin \theta = (e^{i\theta } - e^{-i\theta })/2\).

1.3 Complex analysis and moment generating functions

We conclude our technical digression by stating a few more very useful facts. The proof of these requires properties of the Laplace transform, which is deļ¬ned by \((\mathcal {L}f)(s) = \int _0^\infty e^{-sx} f(x)dx\). The reason the Laplace transform plays such an important role in the theory is apparent when we recall the deļ¬nition of the moment generating function of a random variable \(X\) with density \(f\): \[ M_X(t) = š”¼ [e^{tX}] = \int _{-\infty }^\infty e^{tx} f(x)dx; \] in other words, the moment generating function is the Laplace transform of the density evaluated at \(s=-t\).

Remember that if \(F_X\) and \(G_Y\) are the cumulative distribution functions of the random variables \(X\) and \(Y\) with densities \(f\) and \(g\), then

\begin {eqnarray*} F_X(x) & \ = \ & \int _{-\infty }^x f(t) dt \nonumber \\ G_Y(y) &=& \int _{-\infty }^y g(v)dv. \end {eqnarray*}

We remind the reader of the two important results we assumed in the text (Theorems 20.5.3 and 20.5.4), which we restate below. After stating them we discuss their proofs.

TheoremĀ 1.3.1 Assume the moment generating functions \(M_X(t)\) and \(M_Y(t)\) exist in a neighborhood of zero (i.e., thereā€™s some \(\delta \) such that both functions exist for \({|t|} < \delta \)). If \(M_X(t) = M_Y(t)\) in this neighborhood, then \(F_X(u) = F_Y(u)\) for all \(u\). As the densities are the derivatives of the cumulative distribution functions, we have \(f=g\).

TheoremĀ 1.3.2 Let \(\{X_i\}_{i \in I}\) be a sequence of random variables with moment generating functions \(M_{X_i}(t)\). Assume thereā€™s a \(\delta > 0\) such that when \({|t|} < \delta \) we have \(\lim _{i\to \infty } M_{X_i}(t) = M_X(t)\) for some moment generating function \(M_X(t)\), and all moment generating functions converge for \({|t|} < \delta \). Then there exists a unique cumulative distribution function \(F\) whose moments are determined from \(M_X(t)\) and for all \(x\) where \(F_X(x)\) is continuous, \(\lim _{i\to \infty } F_{X_i}(x) = F_X(x)\).

The proof of these theorems follow from results in complex analysis, speciļ¬cally the Laplace and Fourier inversion formulas. To give an example as to how the results from complex analysis allow us to prove results such as these, we give most of the details in the proof of the next theorem. We deliberately do not try and prove the following result in as great generality as possible!

TheoremĀ 1.3.3 Let \(X\) and \(Y\) be two continuous random variables on \([0,\infty )\) with continuous densities \(f\) and \(g\), all of whose moments are ļ¬nite and agree. Suppose further that:

1.
There is some \(C > 0\) such that for all \(c \le C\), \(e^{(c+1)t} f(e^t)\) and \(e^{(c+1)t} g(e^t)\) are Schwartz functions (see Deļ¬nition 21.1.3). This isnā€™t a terribly restrictive assumption; \(f\) and \(g\) need to have decay in order for all moments to exist and be ļ¬nite. As weā€™re evaluating \(f\) and \(g\) at \(e^t\) and not \(t\), thereā€™s enormous decay here. The meat of the assumption is that \(f\) and \(g\) are inļ¬nitely diļ¬€erentiable and their derivatives decay.
2.
The (not necessarily integral) moments \[ \mu _{r_n}'(f) \ = \ \int _{0}^\infty x^{r_n} f(x)dx \ \ \ {\rm and} \ \ \ \mu _{r_n}'(g) \ = \ \int _0^\infty x^{r_n} g(x)dx \] agree for some sequence of non-negative real numbers \(\{r_n\}_{n=0}^\infty \) which has a ļ¬nite accumulation point (i.e., \(\lim _{n\to \infty } r_n = r < \infty \)).

Then \(f=g\) (in other words, knowing all these moments uniquely determines the probability density).

Proof:Ā We sketch the proof, which is long and sadly a bit technical. Remember the purpose of this proof is to highlight why our needed results from Complex Analysis are true. Feel free to skim or skip the proof, but we urge you to read the example at the end of this section, where we return to the two densities that are causing us so much heartache. Let \(h(x) = f(x) - g(x)\), and deļ¬ne \[ A(z)\ =\ \int _0^\infty x^z h(x)dx. \] Note that \(A(z)\) exists for all \(z\) with real part non-negative. To see this, let \(\Re (z)\) denote the real part of \(z\), and let \(k\) be the unique non-negative integer with \(k \le \Re (z) < k+1\). Then \(x^ \le x^k + x^{k+1}\), and

\begin {eqnarray*} {|A(z)|} & \ \le \ & \int _0^\infty x^ \left [{|f(x)|}+{|g(x)|}\right ]dx \\ & \ \le \ & \int _0^\infty (x^k + x^{k+1}) f(x)dx + \int _0^\infty (x^k+x^{k+1}) g(x)dx \ = \ 2\mu _k' + 2\mu _{k+1}'. \end {eqnarray*}

Results from analysis now imply that \(A(z)\) exists for all \(z\). The key point is that \(A\) is also diļ¬€erentiable. Interchanging the derivative and the integration (which can be justiļ¬ed; see Theorem ??), we ļ¬nd \[ A'(z) \ = \ \int _0^\infty x^z (\log x) h(x) dx. \] To show that \(A'(z)\) exists, we just need to show this integral is well-deļ¬ned. There are only two potential problems with the integral, namely when \(x\to \infty \) and when \(x\to 0\). For \(x\) large, \(x^z \log x \le x^{\Re (z)+1}\) and thus the rapid decay of \(h\) gives \(\left |\int _1^\infty x^z (\log x) h(x)dx \right | < \infty \). For \(x\) near \(0\), \(h(x)\) looks like \(h(0)\) plus a small error (remember weā€™re assuming \(f\) and \(g\) are continuous); thus thereā€™s a \(C\) so that \(|h(x)| \le C\) for \(|x| \le 1\). Note

\begin {eqnarray*} \lim_{\epsilon \to 0} \int _{\epsilon }^1 \left |\int _0^\infty x^z (\log x) h(x)dx \right | & \ \le \ & \lim _{\epsilon \to 0} 1 \int _{\epsilon }^1 1 \cdot (-\log x) \cdot C dx. \end {eqnarray*}

The anti-derivative of \(\log x\) is \(x\log x - x\), and \(\lim _{\epsilon \to 0} (\epsilon \log \epsilon - \epsilon ) = 0\). This is enough to prove that this integral is bounded, and thus from results in analysis we get \(A'(z)\) exists.

We (ļ¬nally!) use our results from complex analysis. As \(A\) is diļ¬€erentiable once, itā€™s inļ¬nitely diļ¬€erentiable and it equals its Taylor series for \(z\) with \(\Re (z) > 0\). Therefore \(A\) is an analytic function which is zero for a sequence of \(z_n\)ā€™s with an accumulation point, and thus itā€™s identically zero. This is spectacular ā€“ initially we only knew \(A(z)\) was zero if \(z\) was a positive integer or if \(z\) was in the sequence \(\{r_n\}\); we now know itā€™s zero for all \(z\) with \(\Re (z) > 0\). This remarkable conclusion comes from complex analysis; itā€™s here that we use it.

We change variables, and replace \(x\) with \(e^t\) and \(dx\) with \(e^tdt\). The range of integration is now \(-\infty \) to \(\infty \), and we set \(\mathfrak {h}(t)dt = h(e^t)e^tdt\). We now have \[ A(z) \ = \ \int _{-\infty }^\infty e^{tz} \mathfrak {h}(t)dt \ = \ 0. \] Choosing \(z = c + 2\pi i y\) with \(c\) less than the \(C\) from our hypotheses gives \[ A(c+2\pi i y) \ = \ \int _{-\infty }^\infty e^{2\pi i ty} \left [e^{ct} \mathfrak {h}(t)\right ]dt \ = \ 0. \] Our assumptions imply that \(e^{ct}\mathfrak {h}(t)\) is a Schwartz function, and thus it has a unique inverse Fourier transform. As we know this transform is zero, it implies that \(e^{ct} \mathfrak {h}(t) = 0\), or \(h(x) = 0\), or \(f(x) = g(x)\). \(\Box \)

We needed the analysis at the end on the inverse Fourier transform as our goal is to show that \(f(x) = g(x)\), not that \(A(z) = 0\). It seems absurd that \(A(z)\) could identically vanish without \(f=g\), but we must rigorously show this.

What if we lessen our restrictions on \(f\) and \(g\); perhaps one of them isnā€™t continuous?

Perhaps thereā€™s a unique continuous probability distribution attached to a given sequence of moments such as in the above theorem, but if we allow non-continuous distributions there could be additional possibilities. This topic is beyond the scope of this book, requiring more advanced results from analysis; however, we wanted to point out where the dangers lie, where we need to be careful.

After proving Theorem 1.3.3, itā€™s natural to go back to the two densities that are causing so much trouble, namely (see (??))

\begin {eqnarray*} f_1(x) & \ = \ & \frac 1{\sqrt {2\pi x^2}}\ e^{-(\log ^2 x) / 2} \nonumber \\ f_2(x) & = & f_1(x) \left [1 + \sin (2\pi \log x)\right ]. \end {eqnarray*}

We know these two densities have the same integral moments (their \(k\)th moments are \(e^{k^2/2}\) for \(k\) a non-negative integer). These functions have the correct decay; note \[ e^{(c+1)t} f_1(e^t) \ = \ e^{(c+1)t} \cdot \frac {e^{-t^2/2}}{\sqrt {2\pi } e^{t}}, \] which decays fast enough for any \(c\) to satisfy the assumptions of Theorem 1.3.3. As these two densities are not the same, some condition must be violated. The only condition left to check is whether or not we have a sequence of numbers \(\{r_n\}_{n=0}^\infty \) with an accumulation point \(r>0\) such that the \(r_n\)th moments agree. Using more results from Complex Analysis (speciļ¬cally, contour integration), we can calculate the \((a+ib)\)th moments. We ļ¬nd

\[(a+ib)^\text{th}\ {\rm moment\ of\ } f_1\ {\rm is}\ \ \ e^{(a+ib)^2/2}\]

and

\[(a+ib)^\text{th}\ {\rm moment\ of\ } f_1\ {\rm is} \ \ \ e^{(a+ib)^2/2} +\frac {i}{2} \left (e^{(a+i(b-2\pi ))^2/2}-e^{(a+i (b+2 \pi ))^2/2}\right ).\]

While these moments agree for \(b=0\) and \(a\) a positive integer, thereā€™s no sequence of real moments having an accumulation point where they agree. To see this, note that when \(b=0\) the \(a\)th moment of \(f_2\) is \begin {equation*}e^{a^2/2} + e^{(a - 2 i \pi )^2/2} \left (1 - e^{4 i a \pi }\right ), \end {equation*} and this is never zero unless \(a\) is a half-integer (i.e., \(a = k/2\) for some integer \(k\)). In fact, the reason we wrote (??) as we did was to highlight the fact that itā€™s only zero when \(a\) is a half-integer. Exponentials of real or complex numbers are never zero, and thus the only way this can vanish is if \(1 = e^{4ia\pi }\). Recalling that \(e^{i\theta } = \cos \theta + i \sin \theta \), we see that the vanishing of the \(a\)th moment is equivalent to \(1 - \cos (4\pi a) - i \sin (4\pi a) = 0\); the only way this can happen is if \(a = k/2\) for some \(k\). If this happens, the cosine term is 1 and the sine term is 0.

1.4 Exercises

ProblemĀ 1.4.1 Let \(f(x) = x^3 \sin (1/x)\) for \(x \neq 0\) and set \(f(0) = 0\). (a) Show that \(f\) is diļ¬€erentiable once when viewed as a function of a real variable, but that it is not diļ¬€erentiable twice. (b) Show that \(f\) is not diļ¬€erentiable when viewed as a function of a complex variable \(z\); it might be useful to note that \(\sin u = (e^{iu} - e^{-iu})/2i\).

ProblemĀ 1.4.2 If weā€™re told that all the moments of \(f\) are ļ¬nite and \(f\) is inļ¬nitely diļ¬€erentiable, must there be some \(C\) such that for all \(c < C\) we have \(e^{(c+1)t} f(e^t)\) is a Schwartz function?