Probability theory

Let {\Omega} be a nonempty set, {\mathcal F} a {\sigma}-algebra on {\Omega}, and {\mathbb P} a measure defined on {\mathcal F} such that {\mathbb P(\Omega)=1}. The triple {(\Omega, \mathcal F, \mathbb P)} is called a probability space; {\Omega} is the sample space, {\mathcal F} the set of events, and {\mathbb P} the probability measure. For an event {E\in\mathcal F}, we call {\mathbb P(E)} the probability of {E}.

If {(\Omega, \mathcal F,\mathbb P)} is a probability space and {(S,\mathcal S)} a measurable space, a measurable function {X:\Omega\rightarrow S} is called a random element. In particular, when {(S,\mathcal S)=(\mathbb R, \mathcal B)}, {X} is called a random variable. For any {B\in\mathcal B}, we define

\displaystyle \mathbb P(X\in B) := \mathbb P\{\omega\in\Omega: X(\omega)\in B \}.

The expectation of {X} is defined by

\displaystyle \mathbb E[X] = \int_{\Omega}X\mathsf d\mathbb P.

The cumulative distribution function of {X} is defined by {F(x) = \mathbb P\{X\in(-\infty, x] \}} for {x\in\mathbb R}.

Theorem 1 Let {F} the distribution function of a random variable {X}. Then {F} is nondecreasing, {F} is right-continuous, {\lim_{x\rightarrow-\infty}F(x)=0}, and {\lim_{x\rightarrow\infty}F(x)=1}.

Proof: If {x<y}, then {\{\omega : X(\omega)\leqslant x \}\subset \{\omega : X(\omega)\leqslant y \}}, so {F(x) = \mathbb P(X\leqslant x)\leqslant\mathbb P(X\leqslant y)=F(y)}.

If {x\in\mathbb R}, let {\{x_n\}} be a sequence of real numbers such that {x_{n+1}<x_n}, {x_n>x}, and {\lim_{n\rightarrow\infty}x_n=x}. Then

\displaystyle \lim_{t\rightarrow x^+}F(t) = \lim_{t\rightarrow x^+}\mathbb P(X\leqslant t)=\mathbb P\left(\bigcap_{n=1}^\infty\mathbb \{X\leqslant x_n\} \right)=\mathbb P(X\leqslant x)=F(x).

Let {\{x_n\}} be a sequence of real numbers such that {x_{n+1}<x_n} and {\lim_{n\rightarrow\infty}x_n=-\infty}. Then

\displaystyle \lim_{x\rightarrow-\infty}F(x) = \mathbb P\left(\bigcap_{n=1}^\infty \{X\leqslant x_n\} \right) = \mathbb P(\varnothing) = 0.

Let {\{x_n\}} be a sequence of real numbers such that {x_n<x_{n+1}} and {\lim_{n\rightarrow\infty}x_n=\infty}. Then

\displaystyle \lim_{x\rightarrow\infty}F(x) = \mathbb P\left(\bigcup_{n=1}^\infty \{X\leqslant x_n\} \right) = \mathbb P(\Omega) = 1.

\Box

Theorem 2 If {X} is a random variable with {\mathbb E[|X|]<\infty} and {\mathbb P(X\geqslant 0)=1}, then

\displaystyle \mathbb E[X] = \int_0^\infty (1-F(x))\mathsf dx

Proof: Using Tonelli’s theorem to justify the interchange in order of integration, we have

\displaystyle  \begin{aligned} \int_0^\infty (1-F(x))\mathsf dx &= \int_0^\infty \mathbb P(X>x)\mathsf dx\\ &= \int_0^\infty \mathbb E[ 1_{\{X>x\}}]\mathsf dx\\ &= \int_0^\infty \int_\Omega 1_{\{X(\omega)>x \} }\mathsf d\mathbb P(\omega)\mathsf dx\\ &= \int_\Omega \int_0^{X(\omega)}\mathsf dx\mathsf d\mathbb P(\omega)\\ &= \int_\Omega X(\omega)\mathsf d\mathbb P(\omega)\\ &= \mathbb E[X]. \end{aligned}

\Box

Theorem 3 If {X} is a random variable with {\mathbb E[|X|]<\infty} and {\mathbb P(X\leqslant 0)=1}, then

\displaystyle \mathbb E[X] = \int_{-\infty}^0 -F(x)\mathsf dx

Proof: Similar computation as above. \Box

Theorem 4 If {X} is a random variable with {\mathbb E[|X|]<\infty} then

\displaystyle \mathbb E[X] = \int_{\mathbb R}x\mathsf dF(x).

Proof: Using integration by parts, we have

\displaystyle  \begin{aligned} \mathbb E[X] &= \int_0^\infty(1-F(x))\mathsf dx + \int_{-\infty}^0 -F(x)\mathsf dx\\ &= x(1-F(x))|_0^\infty + \int_0^\infty x\mathsf dF(x) - \left( xF(x)|_{-\infty}^0 - \int_{-\infty}^0 x\mathsf dF(x) \right)\\ &= 0 + \int_0^\infty x\mathsf dF(x) - 0 + \int_{-\infty}^0 x\mathsf dF(x)\\ &= \int_{\mathbb R}x\mathsf dF(x). \end{aligned}

Note that

\displaystyle 0 \leqslant \lim_{x\rightarrow\infty}x(1-F(x)) = \lim_{x\rightarrow\infty} x\int_x^\infty \mathsf dF(t) \leqslant \lim_{x\rightarrow\infty}\int_x^\infty t\mathsf dF(t)=0

and similarly

\displaystyle 0 \geqslant \lim_{x\rightarrow-\infty}xF(x) = \lim_{x\rightarrow-\infty}x\int_{-\infty}^x\mathsf dF(t)\geqslant \lim_{x\rightarrow-\infty}\int_{-\infty}^x t\mathsf dF(t) = 0,

so the limits used in the above computation are justified. \Box

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s