My beef with “discrete” random variables

In this post I just want to address a pet peeve of mine in introductory probability courses – the distinction between “discrete” and “continuous” random variables. The definitions are usually stated like this (assume for the rest of the post we are working in some probability space {(\Omega,\mathcal F,\mathbb P)}:

Definition 1 A discrete random variable is a function {X:\Omega\rightarrow\mathbb R} which takes on at most countably many values, i.e. there is a sequence {\{x_n\}} of real numbers such that

\displaystyle  \sum_{n=0}^\infty \mathbb P(X=x_n) =1

Generally this sequence {\{x_n\}} is taken to be either the nonnegative integers or the positive integers for convenience, but the specific values are immaterial. The point is that it is emphasized that the distribution a “discrete” random variable is characterized by a probability mass function is the sequence {p_n:=\mathbb P(X=x_n)}.

The “other type” of random variable is typically defined like this:

Definition 2 A continuous random variable is a function {X:\Omega\rightarrow\mathbb R} which takes on uncountably many values, and admits a probability density function {f} which satisfies

\displaystyle  F_X(x) := \mathbb P(X\leqslant x) = \int_{-\infty}^{x}f(t)\ \mathsf dt

for {x\in \mathbb R}.

Notice that the requirement of measurability is left out of the definition. Granted, measure theory is beyond the scope of an undergraduate probability course. But here is the problem. Suppose {X\sim\mathrm{Ber}(p)}, that is,

\displaystyle \mathbb P(X=0) = 1-p = 1 - \mathbb P(X=1)

and {Y\sim U(0,1)}, that is, {Y} admits the density {\mathsf 1_{(0,1)}}. These random variables certainly look different – {\mathbb P(Y=x)=0} for any {x\in\mathbb R}, and it doesn’t look like we can express the distribution of {X} in terms of an integral. But we can!

The key is that the fundamental definition is really this:

Definition 3 A random variable is a function {X:\Omega\rightarrow\mathbb R} that satisfies {X^{-1}()(-\infty,x])} for all {x\in \mathbb R}.

This allows us to characterize any random variable in the following manner

Definition 4 The distribution function {F_X:\mathbb R\rightarrow[0,1]} of a random variable {X} is defined by {F_X(x) = \mathbb P\circ X^{-1}((-\infty,x])}.

Recall that

\displaystyle  X^{-1}((-\infty,x]) = \{\omega\in\Omega: X(\omega)\in (-\infty,x] \} = \{\omega\in\Omega: X(\omega)\leqslant x \}.


\displaystyle \mathbb P\circ X^{-1}((-\infty,x]) = \mathbb P(X\leqslant x).

This isn’t anything new, of course; in undergrad probability we would have be taught that

\displaystyle  F_X(x) = \mathsf 1_{(0,1)}(1-p) + \mathsf 1_{[1,\infty]}p

(probably in equivalent but less compact notation) and

\displaystyle F_Y(y) = y\mathsf 1_{(0,1)}(y) + \mathsf 1_{[1,\infty]}y.

This is true, of course. But probability theory really is just measure theory on finite measure spaces (relax, that was a joke). So let’s be a bit more precise. {Y} has density {\mathsf 1_{(0,1)}(y)} with respect to Lebesgue measure, i.e. the unique {\sigma}-additive translation-invariant measure {m} on {\mathbb R} with {m([0,1])=1} (which for all intents and purposes is simply the length of an interval, aside from the pathological so-called “non-measurable sets” which we need not worry about because we defined random variables to be measurable functions!). When we write e.g. {\mathsf dx} or {\mathsf dt} in an integral, we are implicitly assuming that the integration is with respect to Lebesgue measure. So what if we integrate with respect to a different measure?

Consider the counting measure {\mu} on {N\cup\{0\}} (the power set of the nonnegative integers) defined by

\displaystyle  \mu(S) = \begin{cases} \# S,& S\text{ finite}\\ +\infty,& S\text{ infinite} \end{cases}

where {\#} denotes cardinality. It’s clear that {\mu(\varnothing)=0} since the empty set has zero elements, and that {\mu} is {\sigma}-additive because the disjoint union of finite sets is finite and the disjoint union of an infinite set with any set is infinite. So what is the density {f_X} of {X} with respect to {\mu}? Since {\#\{p\}=\#\{1-p\}=1}, we have

\displaystyle f_X(x) = p^x(1-p)^{1-x}\mathsf 1_{\{0,1\}}(x),

and we may compute for subsets {S\subset \mathbb N\cup{0}}

\displaystyle F_X({S}) = \int_S f_X(x)\mathsf d\mu = (1-p)\mathsf 1_{0\in S} + p\mathsf 1_{1\in S}.

For example,

\displaystyle F_X(\{1,3\}) = \int_{\{1,3\}}p^x(1-p)^{1-x}\mathsf 1_{\{0,1\}}(x) = p.

So it turns out that a “discrete” random variable has a probability density after all! We can compute probabilities by (Lebesgue-Stieltjes) integration, and the only actual distinction is that we are integrating with respect to counting measure instead of Lebesgue measure.

Moral of the story – if someone had explained this to me 5 years ago in my intro probability course, I would not have comprehended a word of it. So if you didn’t, you have been trolled (sorry readers, I am in a silly mood today), and if you did (or you go on to study real analysis and probability in grad school), I hope you understand my frustration.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s