To estimate how many times I sneeze per hour, I wait and measure the time till my forth sneeze.
\(X\) - unobserved (parameter) - my rate of sneezes per hour
\(Y\) - observed - the time till my forth sneeze
Prior: \(X \sim \Gamma(2,6)\)
Likelihood: \(Y \vert _X \sim \Gamma(4,X)\)
Posterior: \(X \vert _Y \sim \Gamma(2+4,6+Y)\)
Waiting for one event of rate \(x\) is distributed $Exp(x), \(Exp(x) = Gamma(1,x)\) (1-“shape”, \(x\)-“rate”),
Waiting for 4 events of rate x is distributed \(Gamma(4,x)\) (4-shape, \(x\)-rate).
\(\Omega\) - a set
Each \(\omega\) in \(\Omega\) is one possible outcome in our uncertain world.
In computational uses, \(\Omega\) is finite.
But when we phrase a statistical model, it is often an infinite mathematical construct.
N <- 100
set.seed(31)
Omega_finite <- tibble(X = rgamma(N, 2, 6),
Y = rgamma(N, 4, X))
Omega_finite %>% kable()
X | Y |
---|---|
0.2614719 | 6.375819 |
0.2138068 | 13.882671 |
0.2260581 | 31.840656 |
0.6520004 | 3.498244 |
0.1674620 | 31.098390 |
0.3659386 | 24.811430 |
0.2862107 | 16.811617 |
0.0819444 | 23.694354 |
0.0397472 | 210.982678 |
0.2405861 | 31.212829 |
0.4348955 | 6.498370 |
0.1159619 | 49.549171 |
1.2929998 | 2.958485 |
0.1931940 | 37.932899 |
0.2535873 | 34.449692 |
0.2134041 | 9.452900 |
0.2050184 | 23.788414 |
0.4423407 | 4.345751 |
0.4228450 | 14.891966 |
0.0856366 | 26.527150 |
0.0741577 | 53.736278 |
0.4213621 | 12.743175 |
0.1966911 | 20.973970 |
0.0180597 | 193.221812 |
0.1155568 | 34.940974 |
0.2431928 | 17.340992 |
0.2724593 | 10.477737 |
0.0442079 | 118.209978 |
0.2262941 | 17.727235 |
0.7087579 | 4.953093 |
0.3510747 | 6.627882 |
0.4372039 | 8.045685 |
0.3626183 | 4.192516 |
0.4547042 | 3.198129 |
0.2749661 | 16.162527 |
0.5460077 | 5.778064 |
0.2386606 | 11.486853 |
0.1236775 | 41.201705 |
0.5858767 | 7.095772 |
0.3355599 | 9.935324 |
0.2707641 | 16.941205 |
0.0692056 | 48.188094 |
0.5268671 | 4.399818 |
0.8421145 | 3.800355 |
0.4928962 | 6.256317 |
0.1952022 | 53.689639 |
0.4827437 | 11.122870 |
0.4343758 | 8.980154 |
0.9312470 | 8.886762 |
0.1548861 | 19.690230 |
0.6090559 | 9.066599 |
0.5128833 | 10.457306 |
0.2859819 | 7.267587 |
0.1985435 | 14.223205 |
0.6939549 | 4.034420 |
0.5590719 | 4.592477 |
0.2787074 | 10.043072 |
0.2683953 | 17.215678 |
0.3662903 | 15.457434 |
0.5969167 | 2.958815 |
0.1678776 | 9.688477 |
0.3537971 | 7.852522 |
0.4488835 | 14.703632 |
0.1498778 | 39.694465 |
0.1815088 | 16.617960 |
0.5520281 | 7.827161 |
0.5138644 | 11.601048 |
0.8351950 | 2.462199 |
0.8163396 | 4.603685 |
0.1543613 | 8.869716 |
0.1891107 | 19.166799 |
0.1784519 | 21.352133 |
0.5623886 | 8.372720 |
0.2314933 | 11.100859 |
0.3914930 | 9.101875 |
0.1936554 | 30.328917 |
0.3082267 | 16.012841 |
0.6272152 | 6.813958 |
0.3532382 | 11.866320 |
0.3038131 | 10.103965 |
0.1858272 | 18.287051 |
0.1946054 | 15.774182 |
0.2667366 | 25.697924 |
0.1965498 | 33.094091 |
0.0441445 | 36.165160 |
0.2619324 | 9.368694 |
0.1944936 | 25.235366 |
0.2751434 | 5.828970 |
0.2183396 | 13.909186 |
0.2330591 | 4.418328 |
0.4273974 | 12.804499 |
0.3440438 | 19.628506 |
0.2807141 | 21.522079 |
0.2276425 | 15.781731 |
0.1098608 | 43.928327 |
0.2856966 | 9.183774 |
0.4002635 | 11.916558 |
0.1680154 | 13.956158 |
0.3150993 | 10.917061 |
0.1901269 | 15.844127 |
Usually we do not actually care about the elements \(\omega\) in \(\Omega\).
We think in terms of events and random variables.
The sample space behind them is implicit.
“all outcomes \(\omega\) in \(\Omega_{finite}\) in which I waited at least five hours”
X | Y |
---|---|
0.2614719 | 6.375819 |
0.2138068 | 13.882671 |
0.2260581 | 31.840656 |
0.1674620 | 31.098390 |
0.3659386 | 24.811430 |
0.2862107 | 16.811617 |
0.0819444 | 23.694354 |
0.0397472 | 210.982678 |
0.2405861 | 31.212829 |
0.4348955 | 6.498370 |
0.1159619 | 49.549171 |
0.1931940 | 37.932899 |
0.2535873 | 34.449692 |
0.2134041 | 9.452900 |
0.2050184 | 23.788414 |
0.4228450 | 14.891966 |
0.0856366 | 26.527150 |
0.0741577 | 53.736278 |
0.4213621 | 12.743175 |
0.1966911 | 20.973970 |
0.0180597 | 193.221812 |
0.1155568 | 34.940974 |
0.2431928 | 17.340992 |
0.2724593 | 10.477737 |
0.0442079 | 118.209978 |
0.2262941 | 17.727235 |
0.3510747 | 6.627882 |
0.4372039 | 8.045685 |
0.2749661 | 16.162527 |
0.5460077 | 5.778064 |
0.2386606 | 11.486853 |
0.1236775 | 41.201705 |
0.5858767 | 7.095772 |
0.3355599 | 9.935324 |
0.2707641 | 16.941205 |
0.0692056 | 48.188094 |
0.4928962 | 6.256317 |
0.1952022 | 53.689639 |
0.4827437 | 11.122870 |
0.4343758 | 8.980154 |
0.9312470 | 8.886762 |
0.1548861 | 19.690230 |
0.6090559 | 9.066599 |
0.5128833 | 10.457306 |
0.2859819 | 7.267587 |
0.1985435 | 14.223205 |
0.2787074 | 10.043072 |
0.2683953 | 17.215678 |
0.3662903 | 15.457434 |
0.1678776 | 9.688477 |
0.3537971 | 7.852522 |
0.4488835 | 14.703632 |
0.1498778 | 39.694465 |
0.1815088 | 16.617960 |
0.5520281 | 7.827161 |
0.5138644 | 11.601048 |
0.1543613 | 8.869716 |
0.1891107 | 19.166799 |
0.1784519 | 21.352133 |
0.5623886 | 8.372720 |
0.2314933 | 11.100859 |
0.3914930 | 9.101875 |
0.1936554 | 30.328917 |
0.3082267 | 16.012841 |
0.6272152 | 6.813958 |
0.3532382 | 11.866320 |
0.3038131 | 10.103965 |
0.1858272 | 18.287051 |
0.1946054 | 15.774182 |
0.2667366 | 25.697924 |
0.1965498 | 33.094091 |
0.0441445 | 36.165160 |
0.2619324 | 9.368694 |
0.1944936 | 25.235366 |
0.2751434 | 5.828970 |
0.2183396 | 13.909186 |
0.4273974 | 12.804499 |
0.3440438 | 19.628506 |
0.2807141 | 21.522079 |
0.2276425 | 15.781731 |
0.1098608 | 43.928327 |
0.2856966 | 9.183774 |
0.4002635 | 11.916558 |
0.1680154 | 13.956158 |
0.3150993 | 10.917061 |
0.1901269 | 15.844127 |
In this sample space:
if \(U\) is part of our model of the world, but \(V\) is not, then our event space contains events such as \(U=1\) but not events such as \(V=3\).
X | Y |
---|---|
0.2614719 | 6.375819 |
0.2138068 | 13.882671 |
0.2260581 | 31.840656 |
0.6520004 | 3.498244 |
0.1674620 | 31.098390 |
0.3659386 | 24.811430 |
0.2862107 | 16.811617 |
0.0819444 | 23.694354 |
0.0397472 | 210.982678 |
0.2405861 | 31.212829 |
0.4348955 | 6.498370 |
0.1159619 | 49.549171 |
1.2929998 | 2.958485 |
0.1931940 | 37.932899 |
0.2535873 | 34.449692 |
0.2134041 | 9.452900 |
0.2050184 | 23.788414 |
0.4423407 | 4.345751 |
0.4228450 | 14.891966 |
0.0856366 | 26.527150 |
0.0741577 | 53.736278 |
0.4213621 | 12.743175 |
0.1966911 | 20.973970 |
0.0180597 | 193.221812 |
0.1155568 | 34.940974 |
0.2431928 | 17.340992 |
0.2724593 | 10.477737 |
0.0442079 | 118.209978 |
0.2262941 | 17.727235 |
0.7087579 | 4.953093 |
0.3510747 | 6.627882 |
0.4372039 | 8.045685 |
0.3626183 | 4.192516 |
0.4547042 | 3.198129 |
0.2749661 | 16.162527 |
0.5460077 | 5.778064 |
0.2386606 | 11.486853 |
0.1236775 | 41.201705 |
0.5858767 | 7.095772 |
0.3355599 | 9.935324 |
0.2707641 | 16.941205 |
0.0692056 | 48.188094 |
0.5268671 | 4.399818 |
0.8421145 | 3.800355 |
0.4928962 | 6.256317 |
0.1952022 | 53.689639 |
0.4827437 | 11.122870 |
0.4343758 | 8.980154 |
0.9312470 | 8.886762 |
0.1548861 | 19.690230 |
0.6090559 | 9.066599 |
0.5128833 | 10.457306 |
0.2859819 | 7.267587 |
0.1985435 | 14.223205 |
0.6939549 | 4.034420 |
0.5590719 | 4.592477 |
0.2787074 | 10.043072 |
0.2683953 | 17.215678 |
0.3662903 | 15.457434 |
0.5969167 | 2.958815 |
0.1678776 | 9.688477 |
0.3537971 | 7.852522 |
0.4488835 | 14.703632 |
0.1498778 | 39.694465 |
0.1815088 | 16.617960 |
0.5520281 | 7.827161 |
0.5138644 | 11.601048 |
0.8351950 | 2.462199 |
0.8163396 | 4.603685 |
0.1543613 | 8.869716 |
0.1891107 | 19.166799 |
0.1784519 | 21.352133 |
0.5623886 | 8.372720 |
0.2314933 | 11.100859 |
0.3914930 | 9.101875 |
0.1936554 | 30.328917 |
0.3082267 | 16.012841 |
0.6272152 | 6.813958 |
0.3532382 | 11.866320 |
0.3038131 | 10.103965 |
0.1858272 | 18.287051 |
0.1946054 | 15.774182 |
0.2667366 | 25.697924 |
0.1965498 | 33.094091 |
0.0441445 | 36.165160 |
0.2619324 | 9.368694 |
0.1944936 | 25.235366 |
0.2751434 | 5.828970 |
0.2183396 | 13.909186 |
0.2330591 | 4.418328 |
0.4273974 | 12.804499 |
0.3440438 | 19.628506 |
0.2807141 | 21.522079 |
0.2276425 | 15.781731 |
0.1098608 | 43.928327 |
0.2856966 | 9.183774 |
0.4002635 | 11.916558 |
0.1680154 | 13.956158 |
0.3150993 | 10.917061 |
0.1901269 | 15.844127 |
\((X,Y)\) may be considered a random vector, viewed as a function \(\Omega_{finite} \to \mathbb{R}^2\) \[\omega \mapsto (X(\omega), Y(\omega))\]
\[Y>5\] means the subset of \(\Omega_{finite}\): \[\{\omega \in \Omega_{finite} \vert Y(\omega)>5\}\]
\[(Y \in [5,9], X<0.3)\] means the subset of \(\Omega_{finite}\): \[= \{\omega \in \Omega_{finite} \vert Y(\omega) \in [5,9], X(\omega)<0.3\}\] \[= \{\omega \in \Omega_{finite} \vert 5 \leq Y(\omega) \leq 9, X(\omega)<0.3\}\]
For our finite example, we may define probabilities proportional to number of outcomes.
A random variable \(Y: \Omega \to \mathbb{R}\) pushes a probability measure \(\mathbb{P}\) over \(\Omega\) to a probability measure \(P_Y\) over \(\mathbb{R}\), called its distribution.
\[P_X((0,0.3)) =\] \[\mathbb{P}(X \in (0,0.3)) =\] \[\mathbb{P}(0 < X < 0.3)\]
Similarily, the distribution of a random vector \((X,Y)\) is a probability measure over the plane \(\mathbb{R^2}\).
It is also called the joint distribution of \(X\) and \(Y\).
A probability distribution \(P_X\) is absolutely continuous, if there is a density function \(f_X\) such that for every region \(D\), \[P_X(D) = \int_D f_X(x) \mathrm{d} x\]
For a random variable \(X\) with density \(f_X\), it is defined (when the integral is well-defined): \[\mathbb{E}(X) = \int x f_X(x) \mathrm{d} x\]
It is actually a special case of a more general notion (not assuming having a density).
Note that the expectation is determined by the distribution.
Given events \(A\), \(B\), such that \(\mathbb{P}(B)>0\), the conditional probability of \(A\) given \(B\) is \[ \mathbb{P}(A|B) = \frac {\mathbb{P}(A \cap B)} {\mathbb{P}(B)}\]
When also \(\mathbb{P}(A)>0\), we get Bayes’ formula: \[ \mathbb{P}(A|B) = \mathbb{P}(B|A) \frac {\mathbb{P}(A)} {\mathbb{P}(B)}\]
Given a random variable \(X:\Omega \to \mathbb{R}\) and an event \(B\) such that \(\mathbb{P}(B)>0\), the conditional distribution of \(X\) given \(B\) is defined by pushing the probability measure \(\mathbb{P}(\cdot |B)\) from \(\Omega\) to \(\mathbb{R}\): \[P_{X|B} (D) = \mathbb{P}(X \in D|B)\] for every region \(D \subset \mathbb{R}\).
For example: \[P_{X|Y>5} ((0,0.3)) = \mathbb{P}(X<0.3 | 5<Y)\]
Given a random variable \(X:\Omega \to \mathbb{R}\) and an event \(B\) such that \(\mathbb{P}(B)>0\), the conditional expectation \(\mathbb{E}(X \vert B)\) is defined as the expectation of the conditional distribution.
If a conditional distribution has a density, we call it “conditional density”.
If \((X,Y)\) is an absolutely continuous random vector whose joint distribution has a density \(f_{X,Y}\), and assume that \(\mathbb{P}(1.9<Y<2.1)>0\).
Then we can look into the conditional density of \(X\) given \(\mathbb{P}(1.9<Y<2.1)\): \[f_{X|1.9<Y<2.1}(x) = \frac {\int_{1.9}^{2.1} f_{(X,Y)}(x,y) \mathrm{d}y} {\mathbb{P}(1.9<Y<2.1)}\] for every \(x\).
\[\frac {\int_a^b \int_{1.9}^{2.1} f_{(X,Y)}(x,y) \mathrm{d}y \mathrm{d}x} {\mathbb{P}(1.9<Y<2.1)}\]
Now what happens when we replace \(1.9\) and \(2.1\) with numbers which get closer to a limit \(y_0\)?
Intuitively, for a given \(x\) and \(y_0\),
\[\frac {\int_{y_0-\delta}^{y_0+\delta} f_{(X,Y)}(x,y) \mathrm{d}y} {\mathbb{P}(y_0-\delta<Y<y_0+\delta)} = \] \[ \frac {\int_{y_0-\delta}^{y_0+\delta} f_{(X,Y)}(x,y) \mathrm{d}y} {\int_{y_0-\delta}^{y_0+\delta} f_Y(y) \mathrm{d}y} \approx_{\delta>0, small} \] \[ \frac { 2 \delta f_{(X,Y)}(x,y_0) \mathrm{d}y} {2 \delta f_Y(y_0)} = \] \[ \frac { f_{(X,Y)}(x,y_0) \mathrm{d}y} {f_Y(y_0)}\]
Assume \((X,Y)\) is a random vector whose joint distribution has a density \(f_{X,Y}\), then for every \(y\) where \(f_Y(y)>0\), we can define the conditional density of \(X\) given \(Y=y\) by \[f_{X|Y=y}(x) = \frac {f_{X,Y}(x,y)}{f_Y(y)}\] for every \(x\).
Note this is just a name, remember that \(\mathbb{P}(Y=y)=0\) for every \(y\).
Now, for every \(x\) we compose the mapping \(y \mapsto f_{X|Y=y}(x)\) with the random variable \(Y\): \[\omega \xrightarrow[]{Y} y \xrightarrow[]{} f_{X|Y=y}(x) = \frac {f_{X,Y}(x,y)}{f_Y(y)}\]
This way, we get a random variable that we call \(f_{X|Y}(x)\): \[f_{X|Y}(x) = \frac {f_{X,Y}(x,Y)}{f_Y(Y)}\]
An we may view this as a random density function.
This way, we can also get conditional distribution, probability, and expectation conditioned on a random variable.
These are all random objects defined in our probability space.
We can characterize them in a way that generalizes to more general cases (without a density).
G. Jay Kerns, Introduction to Probability and Statistics Using R, Third Edition, 2018 source: IPSUR package version 3.0 (thanks, Blaine Mooers!) – till subsection 7.3.1
F.M. Dekking, C. Kraaikamp, H.P. Lopuhaa, and L.E. Meester, A Modern Introduction to Probability and Statistics – Understanding Why and How, Springer Texts in Statistics, 2005 – till about chapter 9