# My beef with “discrete” random variables

In this post I just want to address a pet peeve of mine in introductory probability courses – the distinction between “discrete” and “continuous” random variables. The definitions are usually stated like this (assume for the rest of the post we are working in some probability space ${(\Omega,\mathcal F,\mathbb P)}$:

Definition 1 A discrete random variable is a function ${X:\Omega\rightarrow\mathbb R}$ which takes on at most countably many values, i.e. there is a sequence ${\{x_n\}}$ of real numbers such that

$\displaystyle \sum_{n=0}^\infty \mathbb P(X=x_n) =1$

Generally this sequence ${\{x_n\}}$ is taken to be either the nonnegative integers or the positive integers for convenience, but the specific values are immaterial. The point is that it is emphasized that the distribution a “discrete” random variable is characterized by a probability mass function is the sequence ${p_n:=\mathbb P(X=x_n)}$.

The “other type” of random variable is typically defined like this:

Definition 2 A continuous random variable is a function ${X:\Omega\rightarrow\mathbb R}$ which takes on uncountably many values, and admits a probability density function ${f}$ which satisfies

$\displaystyle F_X(x) := \mathbb P(X\leqslant x) = \int_{-\infty}^{x}f(t)\ \mathsf dt$

for ${x\in \mathbb R}$.

Notice that the requirement of measurability is left out of the definition. Granted, measure theory is beyond the scope of an undergraduate probability course. But here is the problem. Suppose ${X\sim\mathrm{Ber}(p)}$, that is,

$\displaystyle \mathbb P(X=0) = 1-p = 1 - \mathbb P(X=1)$

and ${Y\sim U(0,1)}$, that is, ${Y}$ admits the density ${\mathsf 1_{(0,1)}}$. These random variables certainly look different – ${\mathbb P(Y=x)=0}$ for any ${x\in\mathbb R}$, and it doesn’t look like we can express the distribution of ${X}$ in terms of an integral. But we can!

The key is that the fundamental definition is really this:

Definition 3 A random variable is a function ${X:\Omega\rightarrow\mathbb R}$ that satisfies ${X^{-1}()(-\infty,x])}$ for all ${x\in \mathbb R}$.

This allows us to characterize any random variable in the following manner

Definition 4 The distribution function ${F_X:\mathbb R\rightarrow[0,1]}$ of a random variable ${X}$ is defined by ${F_X(x) = \mathbb P\circ X^{-1}((-\infty,x])}$.

Recall that

$\displaystyle X^{-1}((-\infty,x]) = \{\omega\in\Omega: X(\omega)\in (-\infty,x] \} = \{\omega\in\Omega: X(\omega)\leqslant x \}.$

Hence

$\displaystyle \mathbb P\circ X^{-1}((-\infty,x]) = \mathbb P(X\leqslant x).$

This isn’t anything new, of course; in undergrad probability we would have be taught that

$\displaystyle F_X(x) = \mathsf 1_{(0,1)}(1-p) + \mathsf 1_{[1,\infty]}p$

(probably in equivalent but less compact notation) and

$\displaystyle F_Y(y) = y\mathsf 1_{(0,1)}(y) + \mathsf 1_{[1,\infty]}y.$

This is true, of course. But probability theory really is just measure theory on finite measure spaces (relax, that was a joke). So let’s be a bit more precise. ${Y}$ has density ${\mathsf 1_{(0,1)}(y)}$ with respect to Lebesgue measure, i.e. the unique ${\sigma}$-additive translation-invariant measure ${m}$ on ${\mathbb R}$ with ${m([0,1])=1}$ (which for all intents and purposes is simply the length of an interval, aside from the pathological so-called “non-measurable sets” which we need not worry about because we defined random variables to be measurable functions!). When we write e.g. ${\mathsf dx}$ or ${\mathsf dt}$ in an integral, we are implicitly assuming that the integration is with respect to Lebesgue measure. So what if we integrate with respect to a different measure?

Consider the counting measure ${\mu}$ on ${N\cup\{0\}}$ (the power set of the nonnegative integers) defined by

$\displaystyle \mu(S) = \begin{cases} \# S,& S\text{ finite}\\ +\infty,& S\text{ infinite} \end{cases}$

where ${\#}$ denotes cardinality. It’s clear that ${\mu(\varnothing)=0}$ since the empty set has zero elements, and that ${\mu}$ is ${\sigma}$-additive because the disjoint union of finite sets is finite and the disjoint union of an infinite set with any set is infinite. So what is the density ${f_X}$ of ${X}$ with respect to ${\mu}$? Since ${\#\{p\}=\#\{1-p\}=1}$, we have

$\displaystyle f_X(x) = p^x(1-p)^{1-x}\mathsf 1_{\{0,1\}}(x),$

and we may compute for subsets ${S\subset \mathbb N\cup{0}}$

$\displaystyle F_X({S}) = \int_S f_X(x)\mathsf d\mu = (1-p)\mathsf 1_{0\in S} + p\mathsf 1_{1\in S}.$

For example,

$\displaystyle F_X(\{1,3\}) = \int_{\{1,3\}}p^x(1-p)^{1-x}\mathsf 1_{\{0,1\}}(x) = p.$

So it turns out that a “discrete” random variable has a probability density after all! We can compute probabilities by (Lebesgue-Stieltjes) integration, and the only actual distinction is that we are integrating with respect to counting measure instead of Lebesgue measure.

Moral of the story – if someone had explained this to me 5 years ago in my intro probability course, I would not have comprehended a word of it. So if you didn’t, you have been trolled (sorry readers, I am in a silly mood today), and if you did (or you go on to study real analysis and probability in grad school), I hope you understand my frustration.