Probability basics for Bayesian analysis

A motivating example

To estimate how many times I sneeze per hour, I wait and measure the time till my forth sneeze.

$X$ - unobserved (parameter) - my rate of sneezes per hour
$Y$ - observed - the time till my forth sneeze
Prior: $X \sim \Gamma(2,6)$
Likelihood: $Y \vert _X \sim \Gamma(4,X)$
Posterior: $X \vert _Y \sim \Gamma(2+4,6+Y)$

A motivating example - cont.

rgamma(1000, 2, 6) %>% qplot()

A motivating example - cont.

(tibble(x=seq(0,2,0.01),
        density=dgamma(x, 2, 6)) %>%
  ggplot(aes(x,density)) + mytheme
  + geom_area())

A motivating example - cont.

Waiting for one event of rate $x$ is distributed $Exp(x), $Exp(x) = Gamma(1,x)$ (1-“shape”, $x$-“rate”),
Waiting for 4 events of rate x is distributed $Gamma(4,x)$ (4-shape, $x$-rate).

Background

What distinguishes measure theory and probability theory?

Kolmogorov’s Foundations of the Theory of Probability

Sample space

$\Omega$ - a set
Each $\omega$ in $\Omega$ is one possible outcome in our uncertain world.
In computational uses, $\Omega$ is finite.
But when we phrase a statistical model, it is often an infinite mathematical construct.

Sample space example

N <- 100
set.seed(31)
Omega_finite <- tibble(X = rgamma(N, 2, 6),
                Y = rgamma(N, 4, X))

Omega_finite %>% kable()

X	Y
0.2614719	6.375819
0.2138068	13.882671
0.2260581	31.840656
0.6520004	3.498244
0.1674620	31.098390
0.3659386	24.811430
0.2862107	16.811617
0.0819444	23.694354
0.0397472	210.982678
0.2405861	31.212829
0.4348955	6.498370
0.1159619	49.549171
1.2929998	2.958485
0.1931940	37.932899
0.2535873	34.449692
0.2134041	9.452900
0.2050184	23.788414
0.4423407	4.345751
0.4228450	14.891966
0.0856366	26.527150
0.0741577	53.736278
0.4213621	12.743175
0.1966911	20.973970
0.0180597	193.221812
0.1155568	34.940974
0.2431928	17.340992
0.2724593	10.477737
0.0442079	118.209978
0.2262941	17.727235
0.7087579	4.953093
0.3510747	6.627882
0.4372039	8.045685
0.3626183	4.192516
0.4547042	3.198129
0.2749661	16.162527
0.5460077	5.778064
0.2386606	11.486853
0.1236775	41.201705
0.5858767	7.095772
0.3355599	9.935324
0.2707641	16.941205
0.0692056	48.188094
0.5268671	4.399818
0.8421145	3.800355
0.4928962	6.256317
0.1952022	53.689639
0.4827437	11.122870
0.4343758	8.980154
0.9312470	8.886762
0.1548861	19.690230
0.6090559	9.066599
0.5128833	10.457306
0.2859819	7.267587
0.1985435	14.223205
0.6939549	4.034420
0.5590719	4.592477
0.2787074	10.043072
0.2683953	17.215678
0.3662903	15.457434
0.5969167	2.958815
0.1678776	9.688477
0.3537971	7.852522
0.4488835	14.703632
0.1498778	39.694465
0.1815088	16.617960
0.5520281	7.827161
0.5138644	11.601048
0.8351950	2.462199
0.8163396	4.603685
0.1543613	8.869716
0.1891107	19.166799
0.1784519	21.352133
0.5623886	8.372720
0.2314933	11.100859
0.3914930	9.101875
0.1936554	30.328917
0.3082267	16.012841
0.6272152	6.813958
0.3532382	11.866320
0.3038131	10.103965
0.1858272	18.287051
0.1946054	15.774182
0.2667366	25.697924
0.1965498	33.094091
0.0441445	36.165160
0.2619324	9.368694
0.1944936	25.235366
0.2751434	5.828970
0.2183396	13.909186
0.2330591	4.418328
0.4273974	12.804499
0.3440438	19.628506
0.2807141	21.522079
0.2276425	15.781731
0.1098608	43.928327
0.2856966	9.183774
0.4002635	11.916558
0.1680154	13.956158
0.3150993	10.917061
0.1901269	15.844127

Sample space example - cont

(Omega_finite %>%
  ggplot(aes(X,Y))
  + geom_point(size=5))

Sample space - remarks

Usually we do not actually care about the elements $\omega$ in $\Omega$.
We think in terms of events and random variables.
The sample space behind them is implicit.

Events

Subsets that we care about are called “events”.

Events example

“all outcomes $\omega$ in $\Omega_{finite}$ in which I waited at least five hours”

Omega_finite %>%
filter(Y>5) %>%
kable()

X	Y
0.2614719	6.375819
0.2138068	13.882671
0.2260581	31.840656
0.1674620	31.098390
0.3659386	24.811430
0.2862107	16.811617
0.0819444	23.694354
0.0397472	210.982678
0.2405861	31.212829
0.4348955	6.498370
0.1159619	49.549171
0.1931940	37.932899
0.2535873	34.449692
0.2134041	9.452900
0.2050184	23.788414
0.4228450	14.891966
0.0856366	26.527150
0.0741577	53.736278
0.4213621	12.743175
0.1966911	20.973970
0.0180597	193.221812
0.1155568	34.940974
0.2431928	17.340992
0.2724593	10.477737
0.0442079	118.209978
0.2262941	17.727235
0.3510747	6.627882
0.4372039	8.045685
0.2749661	16.162527
0.5460077	5.778064
0.2386606	11.486853
0.1236775	41.201705
0.5858767	7.095772
0.3355599	9.935324
0.2707641	16.941205
0.0692056	48.188094
0.4928962	6.256317
0.1952022	53.689639
0.4827437	11.122870
0.4343758	8.980154
0.9312470	8.886762
0.1548861	19.690230
0.6090559	9.066599
0.5128833	10.457306
0.2859819	7.267587
0.1985435	14.223205
0.2787074	10.043072
0.2683953	17.215678
0.3662903	15.457434
0.1678776	9.688477
0.3537971	7.852522
0.4488835	14.703632
0.1498778	39.694465
0.1815088	16.617960
0.5520281	7.827161
0.5138644	11.601048
0.1543613	8.869716
0.1891107	19.166799
0.1784519	21.352133
0.5623886	8.372720
0.2314933	11.100859
0.3914930	9.101875
0.1936554	30.328917
0.3082267	16.012841
0.6272152	6.813958
0.3532382	11.866320
0.3038131	10.103965
0.1858272	18.287051
0.1946054	15.774182
0.2667366	25.697924
0.1965498	33.094091
0.0441445	36.165160
0.2619324	9.368694
0.1944936	25.235366
0.2751434	5.828970
0.2183396	13.909186
0.4273974	12.804499
0.3440438	19.628506
0.2807141	21.522079
0.2276425	15.781731
0.1098608	43.928327
0.2856966	9.183774
0.4002635	11.916558
0.1680154	13.956158
0.3150993	10.917061
0.1901269	15.844127

Events example - cont.

Omega_finite %>%
filter(Y>5) %>%
nrow()

[1] 86

Event space

$\mathcal{F}$ - the event space, is the set of events that can be conceptualized in our model of the world.
- In other words, events where it makes sense to ask whether they occur or not.
$\mathcal{F}$ is assumed to be a so-called $\sigma$-algebra.
- It means it has some kind of symmetry that makes it sensible.

Event space example

In this sample space:

tibble(U=c(1,1,2,2),
       V=c(3,4,3,4)) %>%
kable()

U	V
1	3
1	4
2	3
2	4

if $U$ is part of our model of the world, but $V$ is not, then our event space contains events such as $U=1$ but not events such as $V=3$.

Varying the event space

When we talk about conditional probability, etc.,
.. it can always be phrased by conditioning on a different event space.
.. but we will not use this terminology today.

Random variables

A random variable is a function $\Omega \to \mathbb{R}$.
.. which is “measurable” in the event space.

Random variables example

Omega_finite %>% kable()

X	Y
0.2614719	6.375819
0.2138068	13.882671
0.2260581	31.840656
0.6520004	3.498244
0.1674620	31.098390
0.3659386	24.811430
0.2862107	16.811617
0.0819444	23.694354
0.0397472	210.982678
0.2405861	31.212829
0.4348955	6.498370
0.1159619	49.549171
1.2929998	2.958485
0.1931940	37.932899
0.2535873	34.449692
0.2134041	9.452900
0.2050184	23.788414
0.4423407	4.345751
0.4228450	14.891966
0.0856366	26.527150
0.0741577	53.736278
0.4213621	12.743175
0.1966911	20.973970
0.0180597	193.221812
0.1155568	34.940974
0.2431928	17.340992
0.2724593	10.477737
0.0442079	118.209978
0.2262941	17.727235
0.7087579	4.953093
0.3510747	6.627882
0.4372039	8.045685
0.3626183	4.192516
0.4547042	3.198129
0.2749661	16.162527
0.5460077	5.778064
0.2386606	11.486853
0.1236775	41.201705
0.5858767	7.095772
0.3355599	9.935324
0.2707641	16.941205
0.0692056	48.188094
0.5268671	4.399818
0.8421145	3.800355
0.4928962	6.256317
0.1952022	53.689639
0.4827437	11.122870
0.4343758	8.980154
0.9312470	8.886762
0.1548861	19.690230
0.6090559	9.066599
0.5128833	10.457306
0.2859819	7.267587
0.1985435	14.223205
0.6939549	4.034420
0.5590719	4.592477
0.2787074	10.043072
0.2683953	17.215678
0.3662903	15.457434
0.5969167	2.958815
0.1678776	9.688477
0.3537971	7.852522
0.4488835	14.703632
0.1498778	39.694465
0.1815088	16.617960
0.5520281	7.827161
0.5138644	11.601048
0.8351950	2.462199
0.8163396	4.603685
0.1543613	8.869716
0.1891107	19.166799
0.1784519	21.352133
0.5623886	8.372720
0.2314933	11.100859
0.3914930	9.101875
0.1936554	30.328917
0.3082267	16.012841
0.6272152	6.813958
0.3532382	11.866320
0.3038131	10.103965
0.1858272	18.287051
0.1946054	15.774182
0.2667366	25.697924
0.1965498	33.094091
0.0441445	36.165160
0.2619324	9.368694
0.1944936	25.235366
0.2751434	5.828970
0.2183396	13.909186
0.2330591	4.418328
0.4273974	12.804499
0.3440438	19.628506
0.2807141	21.522079
0.2276425	15.781731
0.1098608	43.928327
0.2856966	9.183774
0.4002635	11.916558
0.1680154	13.956158
0.3150993	10.917061
0.1901269	15.844127

Random variables example - cont.

Omega_finite$X[13]

[1] 1.293

Random variables coexist

c(Omega_finite$X[13],
  Omega_finite$Y[13])

[1] 1.293000 2.958485

Random vectors

$(X,Y)$ may be considered a random vector, viewed as a function $\Omega_{finite} \to \mathbb{R}^2$ \[\omega \mapsto (X(\omega), Y(\omega))\]

Events of random variables

\[Y>5\] means the subset of $\Omega_{finite}$: \[\{\omega \in \Omega_{finite} \vert Y(\omega)>5\}\]

Y_is_more_than_five <- 
  Omega_finite %>% 
  filter(Y>5)
nrow(Y_is_more_than_five)

[1] 86

Events of random variables - cont.

\[(Y \in [5,9], X<0.3)\] means the subset of $\Omega_{finite}$: \[= \{\omega \in \Omega_{finite} \vert Y(\omega) \in [5,9], X(\omega)<0.3\}\] \[= \{\omega \in \Omega_{finite} \vert 5 \leq Y(\omega) \leq 9, X(\omega)<0.3\}\]

Probability

A probability measure $\mathbb{P}$ is a function from to $\mathcal{F} \to [0,1]$, satisfying Kolmogorov’s axioms:
- $\mathbb{P}(E) \geq 0$ for all $E$ in $\mathcal{F}$
- $\mathbb{P}(\Omega) = 1$
- $\mathbb{P}(\bigcup_{i=1,2,...} E_i) = \sum_{i=1,2,...} \mathbb{P}(E_i)$ for pairwise disjoint events
A sample space with a probability measure is called a probability space.

Probability example

For our finite example, we may define probabilities proportional to number of outcomes.

P_finite <- (function(event)
  nrow(event)/N)

P_finite(Y_is_more_than_five)

[1] 0.86

Distribution

A random variable $Y: \Omega \to \mathbb{R}$ pushes a probability measure $\mathbb{P}$ over $\Omega$ to a probability measure $P_Y$ over $\mathbb{R}$, called its distribution.

Distribution example

\[P_X((0,0.3)) =\] \[\mathbb{P}(X \in (0,0.3)) =\] \[\mathbb{P}(0 < X < 0.3)\]

Distribution example - cont.

(Omega_finite %>% ggplot() + mytheme
  + geom_histogram(aes(X,..density..), bins=100)
  + geom_segment(x=0, xend=0.3, y=0, yend=0, size=10,
                 color="darkgreen", alpha=0.01)
  + geom_vline(xintercept=c(0,0.3), color="darkgreen"))

Omega_finite %>% filter(0 < X & X < 0.3) %>% P_finite()

[1] 0.57

Joint distribution

Similarily, the distribution of a random vector $(X,Y)$ is a probability measure over the plane $\mathbb{R^2}$.

It is also called the joint distribution of $X$ and $Y$.

Joint distribution example

(ggplot(Omega_finite) + mytheme
  + geom_point(aes(X,Y), size=5)
  + geom_rect(xmin=0, xmax=0.3, ymin=30, ymax=1000,
              fill="darkgreen", alpha=0.01))

Density

A probability distribution $P_X$ is absolutely continuous, if there is a density function $f_X$ such that for every region $D$, \[P_X(D) = \int_D f_X(x) \mathrm{d} x\]

Density example

(tibble(X=seq(0,2,0.01),
        density=dgamma(X,2,6)) %>%
  ggplot(aes(X,density)) + mytheme
  + geom_area(size=3, alpha=0.4)
  + geom_segment(x=0, xend=0.3, y=0, yend=0, size=10,
                 color="darkgreen", alpha=0.01)
  + geom_vline(xintercept=c(0,0.3), color="darkgreen"))

Joint density example

(expand.grid(X=seq(0,2,0.01), Y=seq(0,100,1)) %>%
  mutate(density = dgamma(X,2,6) * dgamma(Y,4,X)) %>%
  ggplot(aes(X,Y,z=density)) + mytheme
  + geom_raster(aes(fill=density))
  + geom_rect(xmin=0, xmax=0.3, ymin=30, ymax=1000,
              color="lightgreen", alpha=0.001))

Expectation

For a random variable $X$ with density $f_X$, it is defined (when the integral is well-defined): \[\mathbb{E}(X) = \int x f_X(x) \mathrm{d} x\]
It is actually a special case of a more general notion (not assuming having a density).
Note that the expectation is determined by the distribution.

Probability as expectation

Given an event $A$, we define $1_A$ to be the random variable such that:
- $1_A(\omega)=1$ if $\omega \in A$
- $1_A(\omega)=0$ if $\omega \notin A$
Then \[\mathbb{P}(A) = \mathbb{E}(1_A)\]

Conditioning

Conditional probability given an event

Given events $A$, $B$, such that $\mathbb{P}(B)>0$, the conditional probability of $A$ given $B$ is \[ \mathbb{P}(A|B) = \frac {\mathbb{P}(A \cap B)} {\mathbb{P}(B)}\]
When also $\mathbb{P}(A)>0$, we get Bayes’ formula: \[ \mathbb{P}(A|B) = \mathbb{P}(B|A) \frac {\mathbb{P}(A)} {\mathbb{P}(B)}\]

Conditional distribution given an event

Given a random variable $X:\Omega \to \mathbb{R}$ and an event $B$ such that $\mathbb{P}(B)>0$, the conditional distribution of $X$ given $B$ is defined by pushing the probability measure $\mathbb{P}(\cdot |B)$ from $\Omega$ to $\mathbb{R}$: \[P_{X|B} (D) = \mathbb{P}(X \in D|B)\] for every region $D \subset \mathbb{R}$.
For example: \[P_{X|Y>5} ((0,0.3)) = \mathbb{P}(X<0.3 | 5<Y)\]

Conditional distribution example

(Omega_finite %>%
  ggplot(aes(X,..density..)) + mytheme
  + geom_histogram())

Conditional distribution example - cont.

(ggplot(Omega_finite) + mytheme
  + geom_point(aes(X,Y), size=5)
  + geom_rect(xmin=0, xmax=2, ymin=10, ymax=1000,
              fill="#00BFC4", alpha=0.01))

Conditional distribution example - cont.

(ggplot(Omega_finite) + mytheme
  + geom_point(aes(X,Y,color=Y>10), size=5))

Conditional distribution - cont.

(Omega_finite %>%
    ggplot(aes(X,..density..)) + mytheme
  + geom_histogram(aes(fill=factor(Y>10)),
                   position="identity",
                   alpha=0.8))

Conditional expectation given an event

Given a random variable $X:\Omega \to \mathbb{R}$ and an event $B$ such that $\mathbb{P}(B)>0$, the conditional expectation $\mathbb{E}(X \vert B)$ is defined as the expectation of the conditional distribution.

Omega_finite %>%
filter(Y>10) %>%
pull(X) %>%
mean

[1] 0.2342998

Conditional density given an event

If a conditional distribution has a density, we call it “conditional density”.

Conditional density - cont.

If $(X,Y)$ is an absolutely continuous random vector whose joint distribution has a density $f_{X,Y}$, and assume that $\mathbb{P}(1.9<Y<2.1)>0$.
Then we can look into the conditional density of $X$ given $\mathbb{P}(1.9<Y<2.1)$: \[f_{X|1.9<Y<2.1}(x) = \frac {\int_{1.9}^{2.1} f_{(X,Y)}(x,y) \mathrm{d}y} {\mathbb{P}(1.9<Y<2.1)}\] for every $x$.

Conditional density - cont.

Indeed, for every $a,b$ such that $a<b$, \[\mathbb{P}(a<X<b | 1.9<Y<2.1) =\] \[\frac {\mathbb{P}(a<X<b , 1.9<Y<2.1)} {\mathbb{P}(1.9<Y<2.1)} =\]

\[\frac {\int_a^b \int_{1.9}^{2.1} f_{(X,Y)}(x,y) \mathrm{d}y \mathrm{d}x} {\mathbb{P}(1.9<Y<2.1)}\]

Conditional density - cont.

Now what happens when we replace $1.9$ and $2.1$ with numbers which get closer to a limit $y_0$?

Conditional density - cont.

Intuitively, for a given $x$ and $y_0$,

\[\frac {\int_{y_0-\delta}^{y_0+\delta} f_{(X,Y)}(x,y) \mathrm{d}y} {\mathbb{P}(y_0-\delta<Y<y_0+\delta)} = \] \[ \frac {\int_{y_0-\delta}^{y_0+\delta} f_{(X,Y)}(x,y) \mathrm{d}y} {\int_{y_0-\delta}^{y_0+\delta} f_Y(y) \mathrm{d}y} \approx_{\delta>0, small} \] \[ \frac { 2 \delta f_{(X,Y)}(x,y_0) \mathrm{d}y} {2 \delta f_Y(y_0)} = \] \[ \frac { f_{(X,Y)}(x,y_0) \mathrm{d}y} {f_Y(y_0)}\]

Conditional density given a random variable

Assume $(X,Y)$ is a random vector whose joint distribution has a density $f_{X,Y}$, then for every $y$ where $f_Y(y)>0$, we can define the conditional density of $X$ given $Y=y$ by \[f_{X|Y=y}(x) = \frac {f_{X,Y}(x,y)}{f_Y(y)}\] for every $x$.
Note this is just a name, remember that $\mathbb{P}(Y=y)=0$ for every $y$.

Conditional density given a random variable - cont.

Now, for every $x$ we compose the mapping $y \mapsto f_{X|Y=y}(x)$ with the random variable $Y$: \[\omega \xrightarrow[]{Y} y \xrightarrow[]{} f_{X|Y=y}(x) = \frac {f_{X,Y}(x,y)}{f_Y(y)}\]
This way, we get a random variable that we call $f_{X|Y}(x)$: \[f_{X|Y}(x) = \frac {f_{X,Y}(x,Y)}{f_Y(Y)}\]
An we may view this as a random density function.

Conditionint on random variable

This way, we can also get conditional distribution, probability, and expectation conditioned on a random variable.
These are all random objects defined in our probability space.
We can characterize them in a way that generalizes to more general cases (without a density).

Probability basics for Bayesian analysis

A motivating example

A motivating example - cont.

A motivating example - cont.

A motivating example - cont.

Background

Kolmogorov’s Foundations of the Theory of Probability

Sample space

Sample space example

Sample space example - cont

Sample space - remarks

Events

Events example

Events example - cont.

Event space

Event space example

Varying the event space

Random variables

Random variables example

Random variables example - cont.

Random variables coexist

Random vectors

Events of random variables

Events of random variables - cont.

Probability

Probability example

Distribution

Distribution example

Distribution example - cont.

Joint distribution

Joint distribution example

Density

Density example

Joint density example

Expectation

Probability as expectation

Conditioning

Conditional probability given an event

Conditional distribution given an event

Conditional distribution example

Conditional distribution example - cont.

Conditional distribution example - cont.

Conditional distribution - cont.

Conditional expectation given an event

Conditional density given an event

Conditional density - cont.

Conditional density - cont.

Conditional density - cont.

Conditional density - cont.

Conditional density given a random variable

Conditional density given a random variable - cont.

Conditionint on random variable

Recommended reading