---
title: 'Practical 2: Count data'
author: "VIBASS"
output:
  html_vignette:
    fig_caption: yes
    number_sections: yes
    toc: yes
    fig_width: 6
    fig_height: 4
vignette: >
  %\VignetteIndexEntry{Practical 2: Count data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  echo = TRUE,
  collapse = TRUE,
  comment = "#>"
)
```


# Introduction 


According to Wikipedia: 

> *A Game of Thrones* is the first novel in *A Song of Ice and Fire*, a series of fantasy novels by the American author George R. R. Martin. In the novel, recounting events from various points of view, Martin introduces the plot-lines of the noble houses of Westeros, the Wall, and the Targaryens. The novel has inspired several spin-off works, including several games. It is also the namesake and basis for the first season of the television series *Game of Thrones*.

If you want to get a bit of atmosphere, you can see a trailer of the series [here](https://www.youtube.com/watch?v=KPLWWIOCOOQ) and if you want the music of this series to accompany you throughout the practice you can go [here](https://www.youtube.com/watch?v=Hf9u3jPvkkI).

# Bayesian inference for the expected number of _u_'s in a page of *A Game of Thrones* 


## The experiment: Number of _u_'s in a page of *A Game of Thrones*

In this practice we need to have imagination, possibly not as much as George R. R. Martin but a little. We have an English edition of the first novel, *A Game of Thrones*, and we are interested in learning about the mean number of letters _u_ we can find on a page of this book. We select pages of the book at random and count the number of _u_'s on each page. The random sampling is _with replacement_ in the sense that the same page could be selected more than once. If we each had the book we could physically conduct the experiment. But as this is not the case, we can use the following data 

```{r}
udata <- data.frame(Us = c(25, 29, 27, 27, 25, 27, 22, 26, 27, 29, 23, 28, 25, 24, 22, 
25, 23, 29, 23, 28, 21, 29, 28, 23, 28))
```

that we have obtained as a consequence of carrying out an experiment consisting of randomly selecting $n=25$ pages of the book and counting the number of _u_'s in each one of them. 
 
We can summarize the results obtained by means of a table of frequencies and its graphical representation 

```{r}
(table.udata <- table(udata))
```

```{r, fig.align='center', out.width="70%"}
barplot(table.udata, col="gray89", xlab="Number of letters u in a page")
```

## The sampling model is approximately Poisson

Let $Y$ the random variable that describes the number of _u_'s in a page of the novel *A Game of Thrones*. We consider in this practice that, given $\lambda$, the distribution of the number of _u_'s is Poisson with rate $\lambda$ 

$$Y \mid \lambda \sim \text{Po}(\lambda),$$
 
whose conditional probability function, expectation and variance are: 

- $P(Y=y \mid \lambda)= \frac{\lambda^y\, e^{-\lambda}}{y!}, \;y=0,1,\ldots$ 
- $\mathbb{E}(Y\mid \lambda) = \lambda$, 
- $\mathbb{V}(Y\mid \lambda) = \lambda.$


We know that this distribution is not exactly Poisson because the number of _u_'s 
that can possibly be printed in a page is necessarily limited, while the potential values of a Poisson variable are not bounded.
Nevertheless, the Poisson distribution can be a useful model in practice.


## A prior distribution for $\lambda$

Recall that the gamma distribution is conjugate with respect to the Poisson probability model. If we elicit a prior gamma distribution Ga$(\alpha_0, \beta_0)$ for $\lambda$, its density is 

$$\pi(\lambda \mid \alpha_0, \beta_0)=\frac{ \beta_0^{\alpha_0}}{\Gamma(\alpha_0) }\, \lambda^{(\alpha_0-1)}\, e^{-\lambda\,\beta_0}, \,\, \lambda>0$$
with expectation and variance

- $\mathbb{E}(\lambda \mid \alpha_0, \beta_0)=\alpha_0 / \beta_0$, 
- $\mathbb{V}(\lambda \mid \alpha_0, \beta_0)= \alpha_0 / \beta_0^2.$

We are going to work with two different prior distributions to learn a bit more about this Bayesian inferential process. We will work with a _non-informative_ prior distribution (Prior 1) and with the prior distribution (Prior 2) provided by a friend of ours, Joan, who has read all books by George R. R. Martin.

\begin{align*}
\text{Prior 1: }& \pi_1(\lambda) \propto\lambda^{-1/2}\\
\text{Joan, Prior 2: }& \pi_2(\lambda)=\text{Ga}(\alpha_0=105.8, \beta_0=4.6).
\end{align*}

The non-informative prior distribution is intended to be a neutral distribution that leaves the data in the spotlight. Let's try to understand which views are expressed in Joan's prior distribution. To do so, we  calculate the mean and standard deviation and graphically represent his prior.

```{r, fig.align='center', out.width="60%"}
curve(
  dgamma(x, 105.8, 4.6), 
  col="dodgerblue", lwd=4,
  xlim=c(15,35), ylim=c(0,0.5),
  xlab = expression(paste("Expected number of u's in a page ", lambda)),
  ylab = 'Prior 2 (Joan)'
  )
```

According to Joan, the prior expectation and standard deviation of $\lambda$ is 
$$\mathbb{E}(\lambda)=105.8/4.6= 23, \quad \mathbb{SD}(\lambda)= \sqrt{105.8/4.6^2}=2.236$$
and the prior probability that $\lambda$ is less than 20, 23, 25, 30 and 35 is:
```{r}
setNames(nm = c(20, 23, 25, 30, 35)) |> pgamma(105.8, 4.6) |> round(3)
```

## The likelihood function of $\lambda$

The likelihood of $\lambda$ is a function of $\lambda$ whose expression requires the observed data $\mathcal D=\{n, \sum y_i\}$, where $\sum y_i$ is the total number of _u_'s observed in the _n_ sampled pages. It is defined as follows

\begin{align*}
L(\lambda \mid \mathcal D)=& P(Y_1=y_1, Y_2=y_2, \ldots, Y_n=y_n \mid \lambda)\\
=& \prod_{i=1}^{n}\, P(Y_i=y_i \mid \lambda) = \frac{\lambda^{\sum y_i}\, e^{-n \lambda}}{\prod_{i=1}^{n}\, y_i!}
\end{align*}
where $Y_i$ represents the random Poisson variable that describes the number $y_i$ of _u_'s in the page $i$. 
Note that the denominator is a scaling factor independent of $\lambda$, that we
can ignore, since we are only interested in the _relative_ likelihood of each possible
value of $\lambda$. The rest of the expression, only depends on the total sum
of observed values, and the number of pages examined.

In our case, we have sampled $n=25$ pages and have registered $\sum y_i= 643$ _u_'s, and consequently, the likelihood function is (up to the scaling factor)

\begin{align*}
L(\lambda \mid \mathcal D)\propto & \lambda^{643}\, e^{-25\, \lambda}
\end{align*}

```{r, fig.align='center', out.width="60%"}
sum_y <- sum(udata$Us)
n <- nrow(udata)
scale_y <- sum(log(factorial(udata)))
curve(
  exp(sum_y * log(x) - n * x - scale_y),
  col = "darkorange", lwd = 4,
  xlim = c(15, 35),
  xlab = expression(paste("Expected number of u's in a page ", lambda)),
  ylab = 'likelihood'
)
```

## The posterior distribution of $\lambda$

The posterior distributions of $\lambda$ for priors 1 and 2 are also gamma distributions with parameters

\begin{align*}
\text{Posterior 1: }& \pi_1(\lambda \mid \mathcal D) =\text{Ga}(\alpha = 643+0.5= 643.5 , \beta =25 )\\
\text{Posterior 2: }& \pi_2(\lambda \mid \mathcal D)=\text{Ga}(\alpha = 643+105.8=748.8, \beta =25+4.6=29.6 )
\end{align*}

Next, we plot on the same graph the informative prior distribution $\text{Ga}(\alpha_0=105.8, \beta_0=4.6)$ from Joan (in blue) and the two posterior distributions, 
$\pi_1(\lambda \mid \mathcal D) =\text{Ga}(\alpha = 643.5, \beta =25 )$ in dark green and $\pi_2(\lambda \mid \mathcal D)=\text{Ga}(\alpha = 748.8, \beta =29.6)$ in light green. 

```{r, fig.align='center', out.width="60%"}
curve(
  dgamma(x, 748.8, 29.6),
  col = "darkgreen", lwd = 4,
  xlim = c(15, 35),
  xlab = expression(paste("Expected number of u's in a page ", lambda)),
  ylab = 'prior and posteriors'
)
curve(
  dgamma(x, 643.5, 25),
  col = "green3", lwd = 4,
  add = TRUE
)
curve(
  dgamma(x, 105.8, 4.6),
  col = "dodgerblue", lwd = 4,
  add = TRUE
)

```

Interestingly, the two posterior distributions are quite similar despite the fact that the informative prior distribution is compatible with smaller lambda values than indicated by the data.
The posterior mean and standard deviation of $\lambda$ according to each posterior distribution are
\begin{align*}
\text{Posterior 1: }& \pi_1(\lambda \mid \mathcal D) =\text{Ga}(643.5, 25), \,\,\mathbb{E} (\lambda \mid \mathcal D)=`r 643.5/25`, \,\,\mathbb{SD}(\lambda \mid \mathcal D)= `r round(sqrt(643.5/25^2), 2)`\\
\text{Posterior 2: }& \pi_2(\lambda \mid \mathcal D)=\text{Ga}(748.8, 29.6 ), \,\,\mathbb{E} (\lambda \mid \mathcal D)= `r round(748.8/29.6, 1)`, \,\,\mathbb{SD}(\lambda \mid \mathcal D)=`r round(sqrt(748.8/29.6^2), 2)`
\end{align*}


95$\%$ credible intervals for $\lambda$ according to posteriors 1 and 2 are 

```{r}
setNames(nm = c(0.025, 0.975)) |> qgamma(643.5, 25) |> round(1)
setNames(nm = c(0.025, 0.975)) |> qgamma(748.8, 29.6) |> round(1)
```

and the subsequent posterior probabilities that the mean number of letters _u_'s in a page of *A Game of Thrones* is between 24 and 26 letters are
```{r}
round(pgamma(26, 643.5, 25) - pgamma(23, 643.5, 25), 3)
round(pgamma(26, 748.8, 29.6) - pgamma(23, 748.8, 29.6), 3)
```


## The posterior predictive distribution for the number of _u_'s in a new page of *A Game of Thrones*

Remember that the posterior predictive distribution for the result of a new observation $Y_{n+1}$ is a Gamma-Poisson distribution GaPo$(\alpha, \beta, 1)$ with probability distribution
\begin{align*}
P(Y_{n+1}&=r \mid \mathcal D) = \frac{\beta^{\alpha}}{\Gamma(\alpha)}\, \frac{\Gamma(r+\alpha)}{r!}\,\frac{1}{(\beta+1)^{(r+\alpha)}}, \,r=0,1,2,\ldots, \nonumber \\
\end{align*}
where $\mathbb{E}(Y_{n+1}\mid \mathcal D) = \alpha/\beta$ and $\mathbb{V}(Y_{n+1} \mid \mathcal D) = \frac{\alpha}{\beta} (1 + 1/\beta)$.

Suppose now that we randomly select a new page (page 26th) in the novel and want to predict the number of letters _u_ we will find in it. The subsequent posterior predictive distribution is $\text{GaPo}(643.5,\, 25, 1)$ if we work in the framework of the posterior 1 or $\text{GaPo}(748.8,\, 29.6, 1)$ if we work in the framework of the posterior 2.
Now we represent both predictive distributions

```{r, fig.align='center', out.width="55%"}
library(extraDistr)
x <- c(10:40)
pred1 <- dgpois(10:40, 643.5, 25)
pred2 <- dgpois(10:40, 748.8, 29.6)
plot(
  x, pred1,
  type = "h", lwd = 2, col = "purple", 
  xlim = c(10, 40), ylim = c(0, 0.1), 
  xlab = "number of letters u in a new page", 
  ylab = "probability"
)
plot(
  x, pred2,
  type = "h", lwd = 2, col = "purple",
  xlim = c(10, 40), ylim = c(0, 0.1),
  xlab = "number of u's in a new page",
  ylab = "probability"
)
```
The graphs of both distributions are visually almost identical. The means and standard deviations are

\begin{align*}
\text{Predictive 1: }& \text{GaPo}(643.5, 25, 1 ), \,\, \mathbb{E}(Y_{26} \mid \mathcal D)=25.74, \,\,SD(Y_{26} \mid \mathcal{D})=5.17\\
\text{Predictive 2: }& \text{GaPo}(748.8, 29.6, 1), \,\, \mathbb{E}(Y_{26} \mid \mathcal D)=25.30, , \,\,SD(Y_{26} \mid \mathcal{D})=5.11
\end{align*}

It is important to note the large variability associated with the prediction process relative to the estimation process. This characteristic of the estimation and prediction processes is general to all inferential processes and statistical methodologies. 

# Time to individual work

We propose below an individual exercise that pursues to consolidate the basic concepts that we have learned in the previous theoretical session and that we have been practising in this session. 


**Exercice** 

You just remembered that you have another friend, Tauriel, who is also very enthusiastic of the novel *A Game of Thrones* and you have thought to ask her opinion about the number of _u_'s in the novel. Her prior distribution for $\lambda$ is very different from that of your other friend Joan. It is $\text{Ga}(90, 3)$ and we will call it prior 3.
\begin{align*}
\text{Tauriel, Prior 3: } & \pi_3(\lambda) = \text{Ga}(90,3) 
\end{align*}

1. How different are your two friends' opinions on $\lambda$? A good idea to answer this question would be to plot both densities and calculate the mean, standard deviation and some relevant probabilities from the subsequent prior distributions.

2. From the results of the previous experiment ($\sum y_i= 643$ _u_'s in a total of $n=25$), compare the posterior distribution for $\lambda$ that Joan and Tauriel would obtain. 

3. Joan and Taurel select $n^{\prime}=1$ new page from the novel. Before they start counting the _u_'s on the page, they ask you to calculate the predicted distribution of each of them over the number of _u_'s they will find on this page. Can you compute them? And since they are asking you to do some work, you could also represent them graphically and describe them numerically.