---
title: 'Practical 2: Normal data'
author: "VIBASS"
output:
  html_vignette:
    fig_caption: yes
    number_sections: yes
    toc: yes
    fig_width: 6
    fig_height: 4
vignette: >
  %\VignetteIndexEntry{Practical 2: Normal data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  echo = TRUE,
  collapse = TRUE,
  comment = "#>"
)
pacman::p_load(
  knitr,
  plotly,
  vibass
)

```


# How tall the VIBASS' participants are?

One of the most commonly used examples of Normal data are heights and weights. We want to continue this tradition. We will dedicate this practice to work with the heights of the VIBASS participants of the past editions. We start first with the height of the women. 

```{r fig.cap = cap, echo = FALSE, fig.align='center'}
cap <- "Some VIBASS' participants."
include_graphics("vibass_participants.jpg")
```


# Bayesian inference for the mean height of the women VIBASS' participants. The variance of the sampling Normal model is known.


## The data

We have the following height data, in metres, for a random sample of $n=15$ women participants from previous editions of VIBASS.

```{r, eval=TRUE}
hwomen <- data.frame(height = c(1.73, 1.65, 1.65, 1.76, 1.65, 1.63, 1.70, 1.58, 1.57, 1.65, 1.74, 1.68, 1.67, 1.58, 1.66))
```

We can summarize the data and represent them by means of a histogram.


```{r summary-table-women-height}
summary_table(
  mean = mean(hwomen$height),
  var = var(hwomen$height),
  quant = quantile(hwomen$height, probs = c(0, 0.25, 0.5, 0.75, 1)),
  label = "Women height",
  digits = 3
)
```

```{r, fig.align='center', out.width="65%"}
hist(hwomen$height,
     main = NULL, xlab="Height in metres",
     col="gray89", xlim=c(1.50, 1.85)
)
```

The histogram obtained does not remind us much of the Normal distribution.
But we know that when we work with small amounts of data, even if they are Normal, we cannot expect their histogram to look like a symmetrical bell.
In fact, if we were to generate small Normal data samples, few of them would resemble the Gaussian ideal. 

## The sampling model is approximately Normal

Let $Y$ be the random variable that describes the height of the women in VIBASS.
We assume that given the mean of the height of the VIBASS women $\mu$, the distribution of $Y$ is Normal with mean $\mu$ and standard deviation $\sigma= 0.1$. 

$$Y \mid \mu \sim \mathcal{N}(\mu,\, \sigma = 0.1),$$
whose conditional density function, expectation and variance are: 

- $f(y \mid \mu )= \frac{1}{0.1 \,\sqrt{2 \pi}}\, \mbox{e}^{\big\{ \frac{-1}{2 \,\cdot \, 0.1^2}\,(y-\mu)^2\big\}}$
- $\text{E}(Y \mid \mu)= \mu$, 
- $\text{Var}(Y \mid \mu) = \sigma^2 = 0.01.$


## A prior distribution for $\mu$

Recall that the Normal distribution is conjugate with respect to the Normal probability model with $\sigma$ known.
If we elicit a Normal prior distribution $\mathcal{N}(\mu_0,\, \sigma_0)$ for $\mu$, its density is 

$$\pi(\mu)=\frac{1}{\sigma_0 \,\sqrt{2 \pi}}\, \mbox{e}^{\, \frac{-1}{2\sigma_{0}^2}\,(\mu-\mu_0)^2 }$$

with prior mean and and variance

- $\text{E}(\mu)= \mu_0$, 
- $\text{Var}(\mu)= \sigma_0.$

We are going to work with a prior distribution based on the information from two of the VIBASS lecturers. 
This prior distribution is 

$$\pi(\mu) = \mathcal{N}(\mu_0=1.70,\, \sigma_0=0.07).$$

On the basis of this distribution these two teachers think that the mean of the height of women VIBASS participants is centred on $1.70$ m with a standard deviation of $0.07$ m.
According to it, we can compute the following percentile and probabilities 

```{r}
qnorm(c(0.005, 0.995), 1.70, 0.07)
pnorm(c(1.50, 1.60, 1.70, 1.80, 1.90), 1.70, 0.07)
```

Consequently, a 99$\%$ credible interval for $\mu$ is $(1.52, \,1.88)$ which means that the probability that a VIBASS participant is neither taller than 1.88 metres nor shorter than 1.52 is 0.99. The probability that she is shorter than 1.50 is 0.0021, the probability that her height is between 1.60 and 1.80 is 0.9234-0.0766= 0.8468 or the probability that is taller than 1.80 is 1-0.9234=0.0766.
Next we show the graphic of the density of this prior distribution. 


```{r, fig.align='center', out.width="60%", fig.cap = cap}
cap <- "Prior distribution for the mean height."
m0 <- 1.7; s0 <- 0.07
curve(
  dnorm(x, m0, s0),
  xlab = expression(paste(mu)), ylab = "prior",
  xlim = c(1.40, 2), lwd = 4, col = "dodgerblue", yaxt = "n"
)
```

## The likelihood function of $\mu$

The likelihood is a function of $\mu$ for the data $\mathcal D=\{y_1, y_2,\ldots, y_{15} \}$. It is defined as follows

\begin{align*}
L(\mu \mid \mathcal D)=& f(y_1, y_2, \ldots, y_{15} \mid \mu) =
 \prod_{i=1}^{15}\, f(y_i \mid \mu) =
 \prod_{i=1}^{15}\, \frac{1}{0.1\,\sqrt{2 \pi}}\, \exp\Big\{\, \frac{-1}{2 \cdot 0.1^2}\,(y_i-\mu)^2\Big\} \\
= & \, \frac{1}{(0.1\,\cdot\sqrt{2 \pi})^{15}}\,\,\exp\Big\{\, \frac{-1}{2 \cdot 0.1^2}\,\sum_{i=1}^{15}\,(y_i-\mu)^2\Big \} \\
 = &\,\frac{1}{(0.1\,\cdot\sqrt{2 \pi})^{15}}\,\,\exp\Big\{\, \frac{-1}{2 \cdot 0.1^2}\,\Big(\sum_{i=1}^{15} y_i^{2} - 2 \cdot 15 \, \bar{y} \, \mu + 15 \mu^2\Big)\Big \}
\end{align*}


where $f(y_1, y_2, \ldots, y_{15} \mid \mu)$ is the joint density of the $Y_i$'s given $\mu$ evaluated in $\mathcal D$.
Since the $Y_i$'s are __conditionally independent__ given $\mu$, the joint density is the product of the marginal densities.
In our case, the likelihood function of $\mu$ is

```{r, fig.align='center', out.width="60%", fig.cap = cap}
cap <- "Likelihood function."
Lnorm <- function(mu, y, sigma) {
  # Essentially:
  # prod(dnorm(y, mean = mu, sd = sigma))
  # but we make it numerically more stable working on the log scale
  # and we vectorise over mu
  vapply(
    mu,
    function(.) exp(sum(dnorm(y, mean = ., sd = sigma, log = TRUE))),
    1
  )
}
curve(
  Lnorm(x, hwomen$height, 0.1),
  xlab = expression(paste(mu)), ylab = "likelihood", 
  xlim = c(1.4, 2), col = "darkorange", lwd = 4, yaxt = "n"
)
```


## The posterior distribution of $\mu$

The posterior distribution of $\mu$ is a Normal distribution with parameters

\begin{align*}
 \pi(\mu \mid \mathcal{D}) & = \mathcal{N}(\mu_n,\, \sigma_n), \, \text{where}\\
 \sigma^2_n & =\frac{\sigma_0^2\,\sigma^2}{\sigma^2 + n \,\sigma_0^2}=
        \frac{0.07^2 \cdot 0.1^2}{0.1^2 + 15 \cdot 0.07^2}= 0.000587,
        \leadsto \sigma_n=0.0242\\
 \mu_n & = \sigma_n^2 \,\Big (\frac{\mu_0}{\sigma_0^2}\, +
       \, \frac{n \bar{y}}{\sigma^2}\Big) = 
      0.000587 \Big(\frac{1.70}{0.07^2} +\frac{15 \cdot 1.66}{0.1^2}\Big) = 1.6648 
\end{align*}

Next, we plot on the same graph the prior (in blue) and the posterior distribution (in green) of $\mu$
\begin{align*}
 \pi(\mu &)=\mathcal{N}(\mu_0=1.70, \sigma_0=0.07)\\
 \pi(\mu &\mid \mathcal D)=\mathcal{N}(\mu_n=1.6648, \sigma_n=0.0242)
\end{align*}

```{r, fig.align='center', out.width="60%", fig.cap = cap, echo = -1}
cap <- "Prior (blue) and posterior (green) distributions for the mean height."
m0 <- 1.7; s0 <- 0.07
m1 <- 1.6648; s1 <- 0.0242
curve(
  dnorm(x, m1, s1),
  xlab = expression(paste(mu)), ylab = "density",
  xlim = c(1.40, 2), lwd = 4, col = "darkgreen", yaxt = "n"
)
curve(
  dnorm(x, m0, s0),
  lwd = 4, col = "dodgerblue", add = TRUE
)

```

The visual difference between the two distributions is very clear: The posterior distribution has a very low variability compared to the prior ($\sigma_n \approx `r round(s1, 2)`$ m vs. $\sigma_0 =`r s0`$ m) and is slightly shifted to the left because the posterior mean is slightly lower than the prior mean ($\mu_n\approx `r round(m1, 1)`$ vs. $\mu_0 = `r m0`$). 

On the basis of the posterior distribution, the two teachers now think that the mean of the height of women VIBASS participants is centred on $\mu_n$ with a standard deviation of $\sigma_n$ metres.
According to it, we can compute the following percentiles and probabilities

```{r}
qnorm(c(0.005, 0.995), m1, s1)
pnorm(c(1.50, 1.60, 1.70, 1.80, 1.90), m1, s1)
```

Consequently, a 99% credible interval for $\mu$ is $(`r paste(ic95_post <- round(qnorm(c(0.005, 0.995), m1, s1), 2), collapse = ", ")`)$, which means that the probability that a VIBASS participant is between `r ic95_post[1]` m and `r ic95_post[2]` m is 0.99.
The probability that she is shorter than 1.50 m is $0.0000$, the probability that her height is between 1.60 m and 1.80 m is $1.0000-0.0040=0.9960$ or the probability that is taller than 1.80 m is 0.0000. 

## The posterior predictive distribution for the height of a new VIBASS participant

We are interested in predicting the height of Aninè, a new female VIBASS participant who has not participated in the sample of the inferential process.
In this case, the posterior predictive distribution for the Aninè's height $Y_{16}$ is a Normal distribution

$$\mathcal{N}\Big(\mu_n, \sqrt{\sigma_n^2+\sigma^2}\Big)$$

with standard deviation $\sqrt{\sigma_n^2+\sigma^2} = \sqrt{0.000587+0.01}=0.1029$.

It is important to note that the mean of this predictive distribution coincides with the mean of the posterior distribution of $\mu$.
However, the enormous variability of the predictive distribution, which depends on the variability of the sampling model and the variability of the a posteriori distribution, is very striking.
The following figure shows the posterior distribution (in green) of $\mu$ and the posterior predictive distribution (in purple) for Aninè's height illustrating the previous comments.

```{r posterior-predictive, fig.align='center', out.width="60%", fig.cap = cap, echo = -1}
cap <- "Predictive (purple) and posterior (green) distributions for the mean height."
mp <- m1; sp <- sqrt(s1^2 + 0.1^2)
curve(
  dnorm(x, m1, s1),
  xlab = expression(paste(mu)), ylab = "density",
  xlim = c(1.40, 2), lwd = 4, col = "darkgreen", yaxt = "n"
)
curve(
  dnorm(x, mp, sp),
  lwd = 4, col = "purple", add = TRUE
)

```

We can calculate prediction intervals for the height of Aninè. A prediction interval at 95% would be

```{r}
qnorm(c(0.025, 0.975), 1.6642, 0.1029)
```
This indicates that the probability that Aninè's height is between 1.46 and 1.87 metres is 0.95. 


# Bayesian inference for the mean height of the women VIBASS participants. The variance of the sampling Normal model is unknown.


If we follow the same scheme as in the previous case where the variance was known, we would have a first section dedicated to the data which would be the same. The sampling model is also approximately Normal but now we will work with a Normal with unknown mean and variance. 


## The sampling model is approximately Normal

Let $Y$ the random variable that describes the height of the women in VIBASS.
We assume that the distribution of $Y$ is Normal with unknown mean $\mu$ and unknown standard deviation $\sigma$ 

$$Y \mid \mu \sim \mathcal{N}(\mu,\, \sigma),$$
whose conditional density function, expectation and variance are: 

- $f(y \mid \mu )= \frac{1}{\sigma \,\sqrt{2 \pi}}\, \mbox{e}^{\big\{ -\frac12 \big(\frac{y-\mu}{\sigma}\big)^2\big\}}$
- $\text{E}(Y\mid \mu)= \mu$, 
- $\text{Var}(Y\mid \mu)=\sigma^2.$


## A prior distribution for $(\mu,\, \sigma^2)$

Our basic quantity of interest is bi-dimensional $(\mu,\, \sigma^2)$. We work in a non-informative prior scenario and use the improper reference prior distribution 

$$\pi(\mu,\, \sigma^2) \propto 1/\sigma^2$$


## The likelihood function of $(\mu,\, \sigma^2)$

The likelihood is a function of $(\mu,\, \sigma^2)$ for the data $\mathcal D=\{y_1, y_2,\ldots, y_{15} \}$. It is defined as follows

\begin{align*}
L(\mu,\, \sigma^2 \mid \mathcal D)=& f(y_1, y_2, \ldots, y_{15} \mid \mu,\, \sigma^2) =
 \prod_{i=1}^{15}\, f(y_i \mid \mu,\, \sigma^2) =
 \prod_{i=1}^{15}\, \frac{1}{\sigma \,\sqrt{2 \pi}}\, \exp\Big\{\, \frac{-1}{2\sigma^2}\,(y_i-\mu)^2\Big\} \\
= & \, \frac{1}{(\sigma\,\sqrt{2 \pi})^{15}}\,\,\exp\Big\{\, \frac{-1}{2\sigma^2}\,\sum_{i=1}^{15}\,(y_i-\mu)^2\Big \} \\
 = &\,\frac{1}{(\sigma\,\sqrt{2 \pi})^{15}}\,\,\exp\Big\{\, \frac{-1}{2  \sigma^2}\,\big(
 \sum_{i=1}^{15} y_i^{2} - 2 \cdot 15 \, \bar{y} \, \mu + 15 \mu^2
 \big) \Big\},
\end{align*}


where $f(y_1, y_2, \ldots, y_{15})$ is the joint density of the $Y_i$'s given $\mu$ and $\sigma^2$ evaluated in $\mathcal D$. Since $Y_i$'s are independent given $\mu$ and $\sigma^2$, the joint density is the product of the marginal densities. In our case, the graphic of the likelihood function of $(\mu,\, \sigma^2)$ will be


```{r likelihood-3d, fig.cap = cap, echo = FALSE}
## Note: plotly breaks LaTeX rendering in rmarkdown
## Need to enclose output in a iframe. 
## https://github.com/ropensci/plotly/blob/master/inst/examples/rmd/MathJax/index.Rmd

cap <- "Bi-variate likelihood function"

lik_norm <- function(m, s) {
  exp(-length(hwomen$height) * log(s) - sum((hwomen$height - m) ** 2) / 2 / s**2)
}

mu_vals <- seq(1.60, 1.75, by = 0.002)
s_vals <- seq(0.03, 0.10, by = 0.002)

surf_lik <- expand.grid(
  mu = mu_vals,
  s = s_vals
)
surf_lik$z <- mapply(lik_norm, surf_lik$mu, surf_lik$s)

p <- plot_ly() %>% 
  add_surface(
    x = ~ mu_vals,
    y = ~ s_vals,
    z = ~ t(matrix(surf_lik$z, length(mu_vals), length(s_vals))),
    showscale = FALSE
  ) %>% 
  layout(
    scene = list(
      xaxis = list(title = "mu"),
      yaxis = list(title = "s2"),
      zaxis = list(visible = FALSE)
    )
  )

htmlwidgets::saveWidget(p, "like_surf.html")

```


<iframe src="like_surf.html" width="100%" height="400" id="igraph" scrolling="no" seamless="seamless" frameBorder="0"> </iframe>


## The posterior distribution of $(\mu,\, \sigma^2)$

The posterior distribution of $(\mu,\, \sigma^2)$ is a bivariant probability distribution whose joint posterior density function can be expressed in terms of the _conditional_ posterior distribution of $\mu$ given $\sigma^2$ and the _marginal_ posterior distribution of $\sigma^2$ as follows

$$
 \pi(\mu, \sigma^2 \mid \mathcal{D}) = 
 \pi(\mu \mid \sigma^2, \mathcal{D}) \, 
 \pi(\sigma^2 \mid \mathcal{D}),
$$

where

- $\pi(\mu \mid \sigma^2, \mathcal D) =\mathcal{N}(\mu_n, \sigma^2_n = \sigma^2/15)$

- $\pi(\sigma^2 \mid \mathcal D)$ is such that $\pi((n-1)s^2/\sigma^2 \mid \mathcal D) = \chi^2(n-1)$, where $s^2 = \sum(y_i - \bar{y})^2 / (n-1)$

- The posterior marginal of $\mu$ is $\pi(\mu \mid \mathcal D)= \text{St}(\bar{y}, s^2/n, n-1)$ 


In our case, the more relevant posterior distributions are the marginal ones:

- The posterior marginal $\pi(\sigma^2 \mid \mathcal D)$ is such that $\pi(0.045598/\sigma^2 \mid \mathcal D) = \chi^2(14)$\footnote{$(14 \cdot 0.003257= 0.045598)$}

- The posterior marginal of $\pi(\mu \mid \mathcal D)= \text{St}(1.66, 0.003257/15=0.000217, 14)$ 

We start working with the posterior $\pi(\mu \mid \mathcal D)$. The posterior mean and scale of $\mu$ are $\bar{y} = 1.66$ and $\sqrt{s^2/n} = 0.014735$ m.
A 99$\%$ credible interval for $\mu$ is $(1.62, 1.70)$ and the posterior probabilities that the mean of the height of the women VIBASS is less than $1.50, 1.60. 1.70, 1.80, \text{and } 1.90$ are $0.0000, 0.0006, 0.9916, 1.0000, \text{and } 1.0000$, respectively. 

```{r}
ny <- length(hwomen$height)
ybar <- mean(hwomen$height)
s2 <- sum((hwomen$height - ybar) ** 2) / (ny - 1)

post_mu <- list(
  mean = ybar,
  scale = sqrt(s2 / ny)
)
post_mu$scale * qt(c(0.005, 0.995), ny - 1) + post_mu$mean
pt((c(1.50, 1.60, 1.70, 1.80, 1.90) - post_mu$mean) / post_mu$scale, ny - 1)
```
The graphics of that posterior density is

```{r posterior-bivariate, fig.align='center', out.width="60%", fig.cap = cap, echo = -1}
cap <- "Posterior (green) distribution for the mean height."
curve(
  dt((x - post_mu$mean) / post_mu$scale, ny - 1) / post_mu$scale,
  xlab = expression(paste(mu)), ylab = "density",
  xlim = c(1.40, 2), lwd = 4, col = "darkgreen", yaxt = "n"
)
```


We focus now on the variance of the sampling model. Its posterior distribution is such that $\pi(0.045598/\sigma^2 \mid \mathcal D) = \chi^2(14)$.
This is not the posterior distribution of $\sigma^2$ but a function, $0.045598/\sigma^2$, of it. Consequently, we can approximate the posterior distribution for $\sigma^2$ by simulation as follows:

- If $$\{a^{(1)}, a^{(2)}, \ldots, a^{(M)}\}$$ is a random sample from $\pi(0.045598/\sigma^2 \mid \mathcal D) = \chi^2(14)$ then 
$$\{0.045598/a^{(1)}, 0.045598/a^{(2)}, \ldots, 0.045598/a^{(M)}\}$$ is a random sample from $\pi(\sigma^2 \mid \mathcal D)$. 

Next we show the approximate graph and some posterior characteristics of the posterior $\pi(\sigma^2 \mid \mathcal D)$. 

```{r, fig.align='center', out.width="70%"}
y <- seq(0, 40, 0.001)
simuchi <- rchisq(y, ny - 1)
simu.sigma <- 0.045598 / simuchi 
hist(
  simu.sigma,
  breaks = 300, freq = FALSE, col = "gray99",
  xlim = c(0, 0.02), ylim = c(0, 400),
  main = NULL, ylab = "density", xlab = expression(paste(sigma2))
)

summary(simu.sigma)
var(simu.sigma)
sqrt(var(simu.sigma))
quantile(simu.sigma, probs = c(0.005,0.995))
```
We observe that $\text{E}(\sigma^2 \mid \mathcal D)=0.0038$ and a $99$% credible interval for $\sigma^2$ is $(0.00147, 0.01118)$. 


# Time to individual work

We propose below an individual exercise that pursues to consolidate the basic concepts that we have learned in the previous theoretical session and that we have been practising in this session. 


**Exercice** 

We focus now on the height of the VIBASS men participants. 

```{r, eval=TRUE}
hmen <- data.frame(height = c(1.92, 1.82, 1.69, 1.75, 1.72, 1.71, 1.73, 1.69, 1.70, 1.78, 1.88, 1.82, 1.86, 1.65))
```

1. Construct a Bayesian inferential process for the mean of the height of the VIBASS men participants. Assume a non-informative prior scenario and a sampling model approximately Normal with
    
    - Known standard deviation $\sigma=0.1$. 
    
    - Unknown variance. 
  
2. Compare the mean of the height between men and women. Is it possible to compute the posterior probability associated to $\mu_\text{m}-\mu_\text{w}$? and to  $\mu_\text{m}/\mu_\text{w}$?