In this post I just want to address a pet peeve of mine in introductory probability courses – the distinction between “discrete” and “continuous” random variables. The definitions are usually stated like this (assume for the rest of the post we are working in some probability space :

Definition 1Adiscrete random variableis a function which takes on at most countably many values, i.e. there is a sequence of real numbers such that

Generally this sequence is taken to be either the nonnegative integers or the positive integers for convenience, but the specific values are immaterial. The point is that it is emphasized that the distribution a “discrete” random variable is characterized by a **probability mass function** is the sequence .

The “other type” of random variable is typically defined like this:

Definition 2Acontinuous random variableis a function which takes on uncountably many values, and admits aprobability density functionwhich satisfies

for .

Notice that the requirement of measurability is left out of the definition. Granted, measure theory is beyond the scope of an undergraduate probability course. But here is the problem. Suppose , that is,

and , that is, admits the density . These random variables certainly look different – for any , and it doesn’t look like we can express the distribution of in terms of an integral. But we can!

The key is that the fundamental definition is really this:

Definition 3Arandom variableis a function that satisfies for all .

This allows us to characterize any random variable in the following manner

Definition 4Thedistribution functionof a random variable is defined by .

Recall that

Hence

This isn’t anything new, of course; in undergrad probability we would have be taught that

(probably in equivalent but less compact notation) and

This is true, of course. But probability theory really is just measure theory on finite measure spaces (relax, that was a joke). So let’s be a bit more precise. has density *with respect to* Lebesgue measure, i.e. the unique -additive translation-invariant measure on with (which for all intents and purposes is simply the length of an interval, aside from the pathological so-called “non-measurable sets” which we need not worry about because we defined random variables to be measurable functions!). When we write e.g. or in an integral, we are implicitly assuming that the integration is with respect to Lebesgue measure. So what if we integrate with respect to a different measure?

Consider the counting measure on (the power set of the nonnegative integers) defined by

where denotes cardinality. It’s clear that since the empty set has zero elements, and that is -additive because the disjoint union of finite sets is finite and the disjoint union of an infinite set with any set is infinite. So what is the density of with respect to ? Since , we have

and we may compute for subsets

For example,

So it turns out that a “discrete” random variable has a probability density after all! We can compute probabilities by (Lebesgue-Stieltjes) integration, and the only actual distinction is that we are integrating with respect to counting measure instead of Lebesgue measure.

Moral of the story – if someone had explained this to me 5 years ago in my intro probability course, I would not have comprehended a word of it. So if you didn’t, you have been trolled (sorry readers, I am in a silly mood today), and if you did (or you go on to study real analysis and probability in grad school), I hope you understand my frustration.