Random Variables

Discrete random variables, probability mass functions, cumulative distribution functions

Concept Acquisition

Discrete random variables
Distribution tables
Probability mass functions
Cumulative distribution functions
Bernoulli, binomial, hypergeometric, Poisson distributions
Relationships between Bernoulli & binomial, binomial & hg, binomial & Poisson

Tool Acquisition

How to recognize binomial distribution
dbinom(), pbinom(), dhyper(), phyper(), choose() etc.
connections to boxes of tickets
How to compute probabilities

Concept Application

Defining random variables
Computing probabilities for random variables
Using R to simulate probabilities

In the last chapter, we saw that if we want to simulate tossing a fair coin $10$ times, and compute the proportion of times that the coin lands heads, we could use the function sample() to sample from (0, 1), where the outcome “Heads” is represented by the number 1 and the outcome “Tails” is represented by the number 0. Our code might look like:

set.seed(1)
tosses_10 <- sample(c(0,1), 10, replace = TRUE)
tosses_10
mean(tosses_10)

 [1] 0 1 0 0 1 0 0 0 1 1

[1] 0.4

We used the same idea - of assigning numbers to outcomes when we looked at the probability distribution of the number of heads in three tosses, and when we defined the Bernoulli, binomial, and hypergeometric distributions.

Random variables

In each of the above scenarios, we had an outcome space $\Omega$, and then defined a function that assigned a real number to each possible outcome in $\Omega$. In our simulation above, if $\Omega$ is the set of outcomes $\{\text{``Heads'', ``Tails''}\}$, we assigned the outcome $\text{``Heads''}$ to the real number $1$, and the outcome $\text{``Tails''}$ to the real number $0$. By sampling over and over again from (0,1), we got a sequence of $0$’s and $1$’s that was randomly generated by our sampling. Once we had numbers (instead of sequences of heads and tails), we were able to do operations using these numbers, such as compute the proportion of times we sampled $1$, or represent the probabilities as a histogram. Moving from non-numeric outcomes in an outcome space $\Omega$ to numbers on the real line is enormously useful.

In mathematical notation:

\[ X : \Omega \rightarrow \mathbb{R}\]

$X$ is called a random variable: variable, because it takes different values on the real line, and random, because it inherits the randomness from the generating process (in this case, the process is tossing a coin).

Random variable: A random variable is a function that associates real numbers with outcomes from a random experiment where $\Omega$ is the associated outcome space.

The values on the real line that are determined by $X$ have probabilities coming from the probability distribution on $\Omega$. The range of the random variable $X$ is the set of all the possible values that $X$ can take. We usually denote random variables by $X, Y, \ldots$ or capital letters towards the end of the alphabet. We write statements about the values $X$ takes, such as $X = 1$ or $X = 0$. Note that $X = 1$ is an event in $\Omega$ consisting of all the outcomes that are mapped to the number $1$ on the real line. The probability of such events is written as $P(X = x)$, where $x$ is a real number.

Note that we are just formalizing the association of outcomes in $\Omega$ with numbers - an association we have seen before, while defining probability distributions. Earlier we defined probability distributions as how the total probability of $1$ or $100$% was distributed among all possible outcomes of the random experiment. Now we can extend this definition to the associated real numbers as defined by $X$.

Probability distribution of a random variable $X$: The set of possible values of $X$, along with the associated probabilities, is called the probability distribution for the random variable $X$.

For example, consider the familiar example of the outcome space of three tosses of a fair coin. We can define the random variable $X$ to be the number of heads, and represent it as in the picture below:

You may notice this is a binomial distribution as discussed yesterday with parameter $N = 3, p = 0.5$. We would say “$X$ is a random variable with a Binomial$(3, 0.5) distribution” to describe it.

Just as with data types, random variables can be classified as discrete or continuous. I will also add the distinction finite discrete and countably discrete.

Discrete and continuous random variables:

Finite Discrete random variables are restricted to take particular values in an interval, they cannot take just any value. These values are specified at the start of the problem and there are a fixed number of possible outcomes.

Ex. the binomial and hypergeometric distributions with parameter $N$ can have values $\{0,1,2,3,\cdots, N\}$. A fixed and finite set of possible outcomes.

Countable discrete random variables have an infinite number of possible outcomes, but they can still be listed. The outcome space is $\{0,1,2,3,\cdots\}$ up to infinity, there is no upper limit on the possible outcomes. The outcomes though are still whole numbers, we can’t get say $0.5$ or $\sqrt{2}$ as an outcome.

We learned one such distribution yesterday, the geometric distribution. This counts, if I repeat a trial with success probability $p$ over and over again, how long until the first success occurs. So $X$ here is the time I first see a successful trail, and in theory I could have to wait 1000, 1 million, etc until the first success (even if that is quite unlikely). We will learn another countable distribution later today called Poisson.

Continuous Random Variables can take non-whole number values like $0.6743, -2.7, \pi, \sqrt{2}$. They are not restricted to just whole values. Continuous random variables can also have a bounded range, so say any value between $[-10, 10]$, or they can have infinite range, so any value no matter how large, is possible if quite unlikely. We will cover more continuous random variables on Wednesday, including one of the most important distributions in statistics the Normal Distribution.

Examples of random variables

Discrete random variables

The number of heads in $3$ tosses of a fair coin: The assignment is similar to the outcomes from a single toss, except now we have the possible outcome from tossing a coin three times. For example, the outcome $HHH$ is assigned the number 3, the outcomes $HHT, HTH, THH$ are all assigned the number 2 etc. Note that even though we should write $X(HHH) = 3$, we don’t. The practice is to ignore the outcome space and just write $X = 3$.
The number of tosses until the coin lands heads for the first time: If $X$ is the random variable representing the number of tosses until a coin lands heads, the smallest value $X$ can take is 1 (you need at least 1 toss), and there is no upper bound, since in theory, one could keep tossing the coin forever and it could land tails every single time.
The number of people that arrive at an ATM in a day
The number of people in the United States who will have read at least one book in 2024
The number of typos in the Stat 20 notes

Continuous random variables

In all of the following, we do not restrict the value taken by the random variable.

Time between consecutive people arriving at an ATM
Price of a stock
Height of a randomly selected stat 20 student
The weight of a randomly selected newborn baby in California
The amount of rain that falls each March in the Western United States

Example: Making bets on red in Roulette

Recall that American roulette wheels have $38$ numbered slots, numbered from $1$ to $36$, of which $18$ are colored red, and $18$ black. There are two green slots numbered $0$ and a $00$. As the wheel spins, a ball is sent spinning in the opposite direction. When the wheel slows the ball will land in one of the numbered slots. Players can make various bets on where the ball lands, such as betting on whether the ball will land in a red slot or a black slot. If a player bets one dollar on red, and the ball lands on red, then they win a dollar, in addition to getting their stake of one dollar back. If the ball does not land on red, then they lose their dollar to the casino. Suppose a player bets six times on six consecutive spins, betting on red each time. Their net gain can be defined as the amount they won minus the amount they lost. Is net gain a random variable? What are its possible values (think about their net gain if they win all 6 times, or win 5 times and lose once etc.)?

Check your answer

Yes, net gain is a random variable, and its possible values are: $-6, -4, -2, 0, 2, 4, 6$. (Why?)

The probability distribution of a discrete random variable $X$

The list of probabilities associated with each of its values is called the probability distribution of the random variable $X$. We can list the values and corresponding probability in a table. This table is called the distribution table of the random variable. For example, let $X$ be the number of heads in $3$ tosses of a fair coin. The probability distribution table for $X$ is shown below. The first column should have the possible values that $X$ can take, denoted by $x$, and the second column should have $P(X = x)$. Make sure that the probabilities add up to 1! $\displaystyle \sum_x P(X = x) = 1$.

$x$	$P(X = x)$
$0$	$\displaystyle \frac{1}{8}$
$1$	$\displaystyle \frac{3}{8}$
$2$	$\displaystyle \frac{3}{8}$
$3$	$\displaystyle \frac{1}{8}$

The probability mass function or pmf of a discrete random variable

Probability mass function (pmf) of a discrete random variable $X$: The pmf of a discrete random variable $X$ is defined to be the function $f(x) = P(X = x)$.

We can write down the definition of the function $f(x)$ and it gives the same information as in the table:

\[f(x) = \begin{cases} \frac{1}{8}, \; x = 0, 3 \\ \frac{3}{8}, \; x = 1, 2 \end{cases}\]

We see here that $f(x) > 0$ for only $4$ real numbers, and is $0$ otherwise. We can think of the total probability mass as $1$, and $f(x)$ describes how this mass of $1$ is distributed among the real numbers. It is often easier and more compact to define the probability distribution of $X$ using $f$ rather than the table.

Let’s revisit the special distributions that we have seen so far.

Some special discrete random variables and their distributions

For each of the named distributions that we defined earlier, we can define a random variable with that probability distribution.

The Discrete Uniform Distribution

Let $X$ take the values $1, 2, 3, \ldots, n$ with $P(X = k) = \displaystyle \frac{1}{n}$ for each of the $k$ from $1$ to $n$. We call $X$ a discrete uniform random variable with $P(X = k) = \displaystyle \frac{1}{n}$ for $1 \le k \le n$. Recall that $n$ is the parameter of the discrete uniform distribution, and we write $X \sim$ Discrete Uniform$(n)$.

Example: Rolling a pair of dice and summing the spots

Suppose we roll a pair of dice and sum the spots, and let $X$ be the sum. Is $X$ a discrete uniform random variable?

Check your answer

No. $X$ takes discrete values: $2, 3, 4, \ldots, 12$, but these are not equally likely.

The Bernoulli Distribution

Recall that the Bernoulli distribution describes the probabilities associated with random binary outcomes that we designate as success and failure, where $p$ is the probability of a success. We can define $X$ be a random variable that takes the value $1$ with probability $p$ and the value $0$ with probability $1-p$, then $X$ is called a Bernoulli random variable, and it indicates whether the outcome of the random experiment or trial was a success or not. We say that $X$ is Bernoulli with parameter $p$, and write $X \sim$ Bernoulli$(p)$. Below are the same probability histograms that we have seen in the last chapter, but now they describe the probability mass function of $X$.

The Binomial Distribution

Recall that the binomial distribution describes the probabilities of the total number of successes in $n$ independent Bernoulli trials. We let $X$ be this total number of successes (think tossing a coin $n$ times, and counting the number of heads). Then we say that $X$ has the binomial distribution with parameters $n$ and $p$, and write $X \sim Bin(n,p)$, where $X$ takes the values in $\{0, 1, 2, \ldots, n\}$, and \[P(X = k ) = \binom{n}{k} p^k (1-p)^{n-k}. \]

So this “function” we learned yesterday to compute probabilities of the binomial distribution is properly called the probability mass function, of pmf, of the Binomial distribution.

The Hypergeometric Distribution

If we have a population of size $N$, with two types of individuals, type $A$ and type $B$. If we have $G \leq N$ people of type $A$, we have $N-G$ people of type $B$. If we draw without replacement from this population $n$ times, we can track the number of people from group $A$ in our sample. Here random variable $X$ to be the number of successes ( or people from group $A$) in a simple random sample of $n$ draws from a population of size $N$, then $X$ will have the hypergeometric distribution with parameters $\left(N, G, n\right)$ where $G$ is the total number of successes in our population. We write this as $X \sim HG(N, G, n)$. If we let $X$ be the number of successes in $n$ draws, then we have that \[ P(X = k) = \frac{\binom{G}{k} \times \binom{N-G}{n-k}}{\binom{N}{n}} \] where $N$ is the size of the population, $G$ is the total number of successes in the population, and $n$ is the sample size (so $k$ can take the values $0, 1, \ldots, n$ or $0, 1, \ldots, G$, if the number of successes in the population is smaller than the sample size.)

The Poisson Distribution

There are three very important distributions that we see over and over again in many situations. One of them is the binomial distribution, which we have discussed above. The reason this distribution is so ubiquitous is that we use it for classifying things into binary outcomes and counting the number of “successes”. The second important discrete distribution is used to model many different things - from the number of people arriving at an ATM in a given period of time, to the frequency with which officers in the Prussian army were accidentally kicked to death by their horses¹. This distribution is called the Poisson distribution after a French mathematician, Siméon-Denis Poisson, who developed the theory in the nineteenth century. Interestingly, he was not the first French mathematician to develop this theory. That honor belonged to the seventeenth century mathematician who was a contemporary (and friend) of Isaac Newton, Abraham de Moivre.² (The third important distribution is called the Normal distribution, also first discovered by de Moivre, which we will introduce later in the course.)

The Poisson distribution appears in situations when we have a very large number of trials in which we are checking the occurrence or not of a particular event which has a very low probability. That is, we have a very large number $n$ of Bernoulli trials, which have a very small $p$ or probability of success, such that the product $np$ is not too small or large. We call the number of successes $X$, and it counts the occurrence of events whose counts tend to be small. Note that $X \sim Bin(n, p)$, and \[P(X = 0) = \binom{n}{0}\times p^0\times (1-p)^n = (1-p)^n.\] For large $n$, it turns out that $(1-p)^n \approx e^{-\lambda}$, where $\lambda = np$. We won’t derive the distribution here, but we will use the Poisson distribution for random variables that count the number of occurrences of events in a given period of time in when the events result from a very large number of independent trials. The independence of the trials means that the probability of success does not change over time.

Poisson distribution: We say that such a random variable $X$ has the Poisson distribution with parameter $\lambda$, and write it as $X \sim Poisson(\lambda)$, if \[ P(X = k) = e^{-\lambda} \frac{\lambda^k}{k!},\] where $k = 0, 1, 2, \ldots$. That is, the possible values of $X$ are non-negative integers, and since $k!$ grows much faster than $\lambda^k$, the probability that $X$ takes large values is very small. The parameter $\lambda$ is called the rate of the distribution, and represents how many “successes” we expect in the given time unit.

Example: The number of soldiers kicked to death by their horses each year in each corps in the Prussian army

This example was made famous by Ladislaus Bortkiewicz in 1898 when he discussed how a Poisson distribution fit the data he obtained from 14 corps in the Prussian cavalry over a period of 20 years. Let’s look at the empirical histogram of the data. Note that death by horse-kicks was quite rare with less than 1 death per corps per year, over the 280 observations. Bortkiewicz recorded the number of deaths per corps per year, and the majority of years had no deaths in any of the corps. Here is the empirical histogram of the data.

This is the shape we want when we look at the distribution of a Poisson random variable, where a very low number of events has a much higher probability than larger numbers, so we have a right-skewed distribution.

Geometric vs Poisson Distribution

The geometric and Poisson distribution both have countably infinite state spaces, that is they have no upper bound on their values and can only take whole number values $\{0,1,2,3,\cdots, \}$. The geometric distribution is suited to the situation: how long do I have to wait until I see a first success in a repeated series of Bernoulli trials. The Poisson is suited to the situation: if I have a very unlikely Bernoulli trial, but I have a very large $N$, how many times will I see an event? Note there is actually no $N$ parameter in a Poisson it just is “like” a very large $N$ but low $p$ binomial.

The pmf functions are

\[ \begin{aligned} \text{Geometric}_{p}(k) &= (1-p)^{k-1}p\\ \text{Poisson}_{\lambda}(k) &= e^{-\lambda} \frac{\lambda^{k}}{k!} \end{aligned} \]

They have slightly different shapes to their distribution, and different properties than make them better suited to certain situations. Poisson usually model say a “random arrival” problem. How many people will visit my website in the next hour? How many cars will pass through this intersection in the next 10 minutes? How many people will get kicked in the face by a horse in the Prussian army in one year in 1898? The size of the Prussian army is so large, and each individual person’s probability of getting kicked by a horse is so small, modeling that by a Binomial is just not practical so we use a Poisson.

A geometric distribution has the property that the probabilities are always getting smaller as you move away from 1. A Poisson, based on its $\lambda$ parameter, can be made to have a “hump” away from 1, so the most likely value of a Poisson can be geared to be some number away from 1, but the most likely value for a geometric is always 1. It is up to the statistician to decide if Geometric or Poisson suits your problem better.

Binomial vs Hypergeometric distributions

Both the binomial and the hypergeometric distributions deal with counting the number of successes in a fixed number of trials with binary outcomes. The difference is that for a binomial random variable, the probability of a success stays the same for each trial, and for a hypergeometric random variable, the probability changes with each trial. If we use a box of tickets to describe these random variables, both distributions can be modeled by sampling from boxes with each ticket marked with $0$ or $1$, but for the binomial distribution, we sample $n$ times with replacement and count the number of successes by summing the draws; and for the hypergeometric distribution, we sample $n$ times without replacement, and count the number of successes by summing the draws.

For a hypergeometric distribution with a very large $N$ (population size), and much smaller $n$ (the number of draws we wish to make) Hypergeometric($N, n, G$) $ $ Binomial($n, p = \frac{G}{N}$). That is, if we have a population with $N$ individuals and $G$ are class $A$, if I sample with replacement my probability of picking someone of type $A$ is $\frac{G}{N}$ each time. I do this $n$ times.

If I sample without replacement, then technically the population is slightly different each draw but practically, when $N$ is much bigger than $n$, hypergeomteric is nearly equal to a binomial.

For example say $N = 1000$, $G = 500$, $n = 10$. Then

\[ \text{Hypergeometric}_{N, G, n}(2)= \frac{\binom{500}{2}\binom{500}{8}}{\binom{1000}{10}}= 0.04337161\]

\[ \text{Binomial}_{n, \frac{G}{N}} = \binom{10}{2}(0.5)^{10} = 0.04394531\] Binomial is much simpler to work with mathematically, so we often use this binomial approximation for large populations, and use a hypergeometric if there is meaningful change to the probability of each draw, that is if $k$ is near the same size as $G$ this will meaningfully change the binomial and geometric probabilities.

Now we will define another important quantity related to random variables. This quantity called the cumulative distribution function is another way to describe the probability distribution of the random variable.

The cumulative distribution function $F(x)$

The pmf function can be thought of as telling you the height of one of the bars of the distribution histogram, for a random variable $X \sim \text{Bin}(10, 0.6)$, $f(4) = P(X = 4)$ is the height of this bar,

If I added up all the heights of the bars, this would add up to one.

What about the probability than $X$ is less than or equal to some number. $P(X \leq 4)$ is the sum of the heights off all the bars up to and including 4,

We have a special function called the cumulative distribution function (cdf) which tells us this number.

Cumulative distribution function (cdf) $F(x)$: The cumulative distribution function of a random variable $X$ is defined for every real number, and gives, for each $x$, the amount of probability or mass that has been accumulated up to (and including) the point $x$, that is, $F(x) = P(X \le x)$.

We usually abbreviate this function to cdf. It is a very important function since it also describes the probability distribution of $X$. In order to compute $F(x)$ for any real number $x$, we just add up all the probability so far: \[F(x) = \sum_{y \le x} f(y)\]

For example, if $X$ is the number of heads in $3$ tosses of a fair coin, recall that:

\[ f(x) = \begin{cases} \displaystyle \frac{1}{8}, \; x = 0, 3 \\ \displaystyle \frac{3}{8}, \; x = 1, 2 \end{cases} \]

In this case, $F(x) = P(X\le x) = 0$ for all $x < 0$ since the first positive probability is at $0$. Then, $F(0) = P(X \le 0) = 1/8$ after which it stays at $1/8$ until $x = 1$. Look at the graph below:

Notice that $F(x)$ is a step function, and right continuous. The jumps are at exactly the values for which $f(x) > 0$. We can get $F(x)$ from $f(x)$ by adding the values of $f$ up to and including $x$, and we can get $f(x)$ from $F(x)$ by looking at the size of the jumps.

Example: Writing down the cdf of a Bernoulli random variable

Suppose $X $ Bernoulli$(0.5)$. Then we know that $P(X = 0) = P(X=1) = 0.5$. The cdf of $X$, $F(x)$ gives us the total probability so far up to and including $x$. For example, if $x = -3$, $F(x) =0$ since the first time there is any positive probability for $X$ is at $0$. At $x = 0$, $F(x) = 0.5$, and it stays there until it gets to $x =1$, when it “accumulates” another $0.5$ of probability. Here is the figure:

Notice where the function is open and closed ($\circ$ vs $\bullet$).

Exercise: Drawing the graph of the cdf

Let $X$ be the random variable defined by the distribution table below. Find the cdf of $X$, and draw the graph, making sure to define $F(x)$ for all real numbers $x$. Before you do that, you will have to determine the value of $f(x)$ for $x = 4$.

$x$	$P(X = x)$
$-1$	$0.2$
$1$	$0.3$
$2$	$0.4$
$4$	??

Check your answer

Since $\displaystyle \sum_x P(X = x) = \sum_x f(x) = 1$, $f(4) = 1-(0.2+0.3+0.4) = 0.1.$ Therefore $F(x)$ is as shown below.

Computing Interval Probabilities Using the CDF

CDF’s are very useful to compute probabilities of the form $P(a < X \leq b)$ for some numbers $a$ and $b$. Via the mutual exclusion rule,

\[P(a < X \leq b) = P(X \leq b)-P(X \leq a) = F(b)-F(a)\] If I want the area of all the bars of the pmf above a and less than or equal to b, this is the area of all bars up to and including b minus the area of all bars up to and including a, which is exactly what the CDF calculates. We often have ready to go ways to evaluate CDF, so to compute say $P(2< X \leq 30)$ rather than add up 28 values of the pmf, I can just take a difference of two values of the CDF.

Ideas in code

R knows all the special distributions we have discussed, e.g. binomial, hypergeemetric, Poisson. In R, there are three things we might want to do with a distribution:

Generate a random value from that distribution
Evaluate the pmf $f(x) = P(X = x)$ for that distribution
Evaluate the cdf $F(x) = P(X \leq x)$ for that distribution

R has “short form” names for these distributions, ex binom, hyper, pois etc. The general code structure is:

rname_of_dist(x, params) generates a random draw from the distribution (r stands for random)
dname_of_dist(x, params) evaluates pmf at location x (d stands for density)
pname_of_dist(x, params) evaluates cdf at location x (p is probability)

Let’s see some examples to learn this pattern

Bernoulli$(p)$ and Binomial$(n,p)$

dbinom computes the pmf of $X$, $f(k) = P(X = k)$, for $k = 0, 1, \ldots, n$.

Arguments:
- x: the value of $k$ in $f(k)$
- size: the parameter $n$, the number of trials
- prob: the parameter $p$, the probability of success

pbinom computes the cdf $F(x) = P(X \le x)$

Arguments:
- q: the value of $x$ in $F(x)$
- size: the parameter $n$, the number of trials
- prob: the parameter $p$, the probability of success

rbinom generates a sample (random numbers) from the Binomial$(n,p)$ distribution.

Arguments:
- n: the sample size
- size: the parameter $n$, the number of trials
- prob: the parameter $p$, the probability of success

Example

Suppose we consider $n = 3$, $p= 0.5$, that is, $X$ is the number of successes in 3 independent Bernoulli trials.

# probability that we see exactly 1 success = f(1)
dbinom(x = 1, size = 3, prob = 0.5)

[1] 0.375

# probability that we see at most 1 success = F(1) = f(0) + f(1)
pbinom(q = 1, size = 3, prob = 0.5 )

[1] 0.5

# check f(0) + f(1)
dbinom(x = 0, size = 3, prob = 0.5) + dbinom(x = 1, size = 3, prob = 0.5)

[1] 0.5

# generate a sample of size 5 where each element in sample 
# represents number of successes in 3 trials (like number of heads in 3 tosses)
rbinom(n = 5, size = 3, prob = 0.5)

[1] 1 1 2 1 2

# if we want to generate a sequence of 10 tosses of a fair coin, for example:
rbinom(n = 10, size = 1, prob = 0.5)

 [1] 0 1 1 0 1 1 0 1 0 0

Exercise

In the section on the Binomial distribution above, we had an exercise where $X \sim Bin(10, 0.4)$. Using the functions defined above, compute:

$X = 5$
$X \le 5$
$3 \le X \le 8$

Check your answer

# P(X = 5)
dbinom(x = 5, size = 10, prob = 0.4)

[1] 0.2006581

# P(X = 5)
pbinom(5, 10, 0.4) - pbinom(4, 10, 0.4)

[1] 0.2006581

# P(X <= 5)
dbinom(x = 0, size = 10, prob = 0.4) + dbinom(x = 1, size = 10, prob = 0.4) + 
  dbinom(x = 2, size = 10, prob = 0.4) + dbinom(x = 3, size = 10, prob = 0.4) +
  dbinom(x = 4, size = 10, prob = 0.4) + dbinom(x = 5, size = 10, prob = 0.4)

[1] 0.8337614

# P(X <= 5)
pbinom(5, 10, 0.4)

[1] 0.8337614

# P(3 <= X <= 8)
dbinom(x = 3, size = 10, prob = 0.4) + dbinom(x = 4, size = 10, prob = 0.4) + 
  dbinom(x = 5, size = 10, prob = 0.4) + dbinom(x = 6, size = 10, prob = 0.4) +
  dbinom(x = 7, size = 10, prob = 0.4) + dbinom(x = 8, size = 10, prob = 0.4)

[1] 0.8310325

# P(3 <= X <= 8)
pbinom(8, 10, 0.4) - pbinom(2, 10, 0.4)

[1] 0.8310325

What is going on in the last expression? Why is $P(3 <= X <= 8) = F(8) - F(2)$?

Check your answer

$P(3 <= X <= 8)$ consists of all the probability at the points $3, 4, 5, 6, 7, 8$.

$F(8) = P(X \le 8)$ is all the probability up to $8$, including any probability at $8$. We subtract off all the probability up to and including $2$ from $F(8)$ and are left with the probability at the values $3$ up to and including $8$, which is what we want.

Hypergeometric $(N, G, n)$

The notation is a bit confusing, but just remember that x is usually the number $k$ that you want the probability for, and m + n$=N$ is the total number of successes and failures, or the population size.

dhyper computes the pmf of $X$, $f(k) = P(X = k)$, for $k = 0, 1, \ldots, n$.

Arguments:
- x: the value of $k$ in $f(k)$
- m: the parameter $G$, the number of successes in the population
- n: the value $N-G$, the number of failures in the population
- k: the sample size (number of draws $n$, note that $0 \le k \le m+n$)

phyper computes the cdf $F(x) = P(X \le x)$

Arguments:
- q: the value of $x$ in $F(x)$
- m: the parameter $G$, the number of successes in the population
- n: the value $N-G$, the number of failures in the population
- k: the sample size (number of draws $n$)

rhyper generates a sample (random numbers) from the hypergeometric$(N, G, n)$ distribution.

Arguments:
- nn: the number of random numbers desired
- m: the parameter $G$, the number of successes in the population
- n: the value $N-G$, the number of failures in the population
- k: the sample size (number of draws $n$)

Example

Suppose we consider $N = 10, G = 6, n = 3$, that is, $X$ is the number of successes in 3 draws without replacement from a box that has 6 tickets marked $\fbox{1}$ and 4 tickets marked $\fbox{0}$

# probability that we see exactly 1 success = f(1)
dhyper(x = 1, m = 6, n = 4, k = 3)

[1] 0.3

# you can compute this by hand as well to check. 

# probability that we see at most 1 success = F(1) = f(0) + f(1)
phyper(q = 1, m = 6, n = 4, k = 3)

[1] 0.3333333

# check f(0) + f(1)
dhyper(x = 0, m = 6, n = 4, k = 3) + dhyper(x = 1, m = 6, n = 4, k = 3)

[1] 0.3333333

# generate a sample of size 5 where each element in sample 
# represents number of successes in 3 draws
rhyper(nn = 5, m = 6, n = 4, k = 3)

[1] 2 3 2 1 2

Poisson($\lambda$)

dpois computes the pmf of $X$, $f(k) = P(X = k)$, for $k = 0, 1, 2, \ldots$.

Arguments:
- x: the value of $k$ in $f(k)$
- lambda: the parameter $\lambda$

ppois computes the cdf $F(x) = P(X \le x)$

Arguments:
- q: the value of $x$ in $F(x)$
- lambda: the parameter $\lambda$

rpois generates a sample (random numbers) from the Poisson($\lambda$) distribution.

Arguments:
- n: the desired sample size
- lambda: the parameter $\lambda$

Example

Suppose we consider $\lambda = 1$, that is $X \sim$ Poisson$(\lambda)$.

# probability that we see exactly 1 event = f(1)
dpois(x = 1, lambda = 1)

[1] 0.3678794

#check f(1) = exp(-lambda)*lambda = exp(-1)*1
exp(-1)

[1] 0.3678794

# probability that we see at most 1 success = F(1) = f(0) + f(1)
ppois(q = 1,lambda = 1)

[1] 0.7357589

# check f(0) + f(1)
dpois(x = 0, lambda = 1) + dpois(x = 1, lambda = 1)

[1] 0.7357589

# generate a sample of size 5 where each element in sample 
# represents a random count from the Poisson(1) distribution
rpois(n = 5, lambda = 1)

[1] 1 1 1 0 2

Summary

In these notes, we defined random variables, and described discrete and continuous random variables.
For any random variable, there is an associated probability distribution, and this is described by the probability mass function or pmf $f(x)$.
We also defined a function that, for a random variable $X$, and any real number $x$, describes all the probability that is to the left of $x$. This function is called the cumulative distribution function (cdf) of $X$ and is denoted $F(x)$.
We looked at some special distributions (discrete uniform, Bernoulli, binomial, hypergeometric, and Poisson)
We defined functions in R that can compute the pmf and cdf for the named distributions (except for discrete uniform since it doesn’t need a special function as we can just use sample()).

\(x\)	\(P(X = x)\)
\(0\)	\(\displaystyle \frac{1}{8}\)
\(1\)	\(\displaystyle \frac{3}{8}\)
\(2\)	\(\displaystyle \frac{3}{8}\)
\(3\)	\(\displaystyle \frac{1}{8}\)

\(x\)	\(P(X = x)\)
\(-1\)	\(0.2\)
\(1\)	\(0.3\)
\(2\)	\(0.4\)
\(4\)	??

Random Variables

Concept Acquisition

Tool Acquisition

Concept Application

Random variables

Examples of random variables

Discrete random variables

Continuous random variables

Example: Making bets on red in Roulette

The probability distribution of a discrete random variable \(X\)

The probability mass function or pmf of a discrete random variable

Some special discrete random variables and their distributions

The Discrete Uniform Distribution

Example: Rolling a pair of dice and summing the spots

The Bernoulli Distribution

The Binomial Distribution

The Hypergeometric Distribution

The Poisson Distribution

Example: The number of soldiers kicked to death by their horses each year in each corps in the Prussian army

Geometric vs Poisson Distribution

Binomial vs Hypergeometric distributions

The cumulative distribution function \(F(x)\)

Example: Writing down the cdf of a Bernoulli random variable

Exercise: Drawing the graph of the cdf

Computing Interval Probabilities Using the CDF

Ideas in code

Bernoulli\((p)\) and Binomial\((n,p)\)

Example

Exercise

Hypergeometric \((N, G, n)\)

Example

Poisson(\(\lambda\))

Example

Summary

Footnotes