---
title: 'Practical 7: Bayesian Hierarchical Modelling'
author: "VIBASS"
output:
  html_vignette:
    fig_caption: yes
    number_sections: yes
    toc: yes
    fig_width: 6
    fig_height: 4
vignette: >
  %\VignetteIndexEntry{Practical 7: Bayesian Hierarchical Modelling}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Bayesian Hierarchical Modelling

In this last practical we will consider the analysis of Bayesian hierarchical
models. As explained in the previous lecture, hierarchical models provide a
convenient tool to define models so that the different sources of variation in
the data are clearly identified. Bayesian inference for highly structured
hierarchical models can be difficult and may require the use of Markov chain
Monte Carlo methods.

However, packages such as `BayesX` and `INLA`, to mention
just two, provide a very convenient way to fit and make inference about certain
types of Bayesian hierarchical models. We will continue using BayesX, although
you can try out using `INLA` if you look at the optional, additional material
in Practical 8.

# Linear Mixed Models

Linear mixed models were defined in the lecture as follows:

$$
Y_{ij} = X_{ij}\beta +\phi_i+\epsilon_{ij}
$$

Here, Y_{ij} represents observation $j$ in group $i$, X_{ij} are a vector of
covariates with coefficients $\beta$, $\phi_i$ i.i.d. random effects and
$\epsilon_{ij}$ a Gaussian error term. The distribution of the random effects
$\phi_i$ is Gaussian with zero mean and precision $\tau_{\phi}$.


# Multilevel Modelling

Multilevel models are a particular type of mixed-effects models in which
observations are nested within groups, so that group effects are modelled using
random effects. A typical example is that of students nested within classes.

For the next example, the `nlschools` data set (in package `MASS`) will be used.
This data set records data about students' performance (in particular, about a
language score test) and other variables. The variables in this data set are:

* `lang`, language score test.

* `IQ`, verbal IQ.

* `class`, class ID.

* `GS`, class size as number of eighth-grade pupils recorded in the class.

* `SES`, social-economic status of pupil’s family.

* `COMB`, whether the pupils are taught in the multi-grade class with 7th-grade students.

The data set can be loaded and summarised as follows:

```{r}
library("MASS")
data("nlschools")
summary(nlschools)
```

The model to fit will take `lang` as the response variable and include
`IQ`, `GS`, `SES` and `COMB` as covariates (i.e., fixed effects). This model
can easily be fit with `BayesX` (using the `R2BayesX` package) as follows:

```{r message = FALSE, warning = FALSE}
library("R2BayesX")
m1 <- bayesx(lang ~ IQ + GS +  SES + COMB, data = nlschools)

summary(m1)
```

Note that the previous model only includes fixed effects. The data set includes
`class` as the class ID to which each student belongs. Class effects can have an
impact on the performance of the students, with students in the same class
performing similarly in the language test.

Very conveniently, `Bayesx` can include random effects in the model by adding a
term in the right hand side of the formula that defined the model. Specifically, 
the term to add is `sx(class, bs = "re")` (see code below for the
full model).
This will create a random effect indexed over variable `class` and which is of
type `re`, i.e., it provides random effects which are independent and
identically distributed using a normal distribution with zero mean and
precision $\tau$ - there are clear similarities with a residual terms here
(although that is at the observation level, rather than the group level).

Before fitting the model, the between-class variability can be explored by
means of boxplots:

```{r fig = TRUE, fig.width = 15, fig.height = 5}
boxplot(lang ~ class, data = nlschools, las = 2)
```

The code to fit the model with random effects is:

```{r}
m2 <- bayesx(
  lang ~ IQ + GS +  SES + COMB + sx(class, bs = "re"),
  data = nlschools
)

summary(m2)
```

You can compare this Bayesian fit with a non-Bayesian analysis, this time making
use of the `lmer()' function in the package `lme4`. (Here, `lme` means `linear
effects.) We present the code only for comparison, and we do not describe
`lmer()` or `lme4` more fully.

```{r}
library(lme4)
m2lmer <- lmer(
  lang ~ IQ + GS +  SES + COMB + (1 | class), # Fit a separate intercept for each level of class
  data = nlschools
)

summary(m2lmer)
```

## Exercises

* Are the model fits from `bayesx()` and from `lmer()` similar?

* What are the relative sizes of the variances estimated in the random effect
term for `class` versus the residual variance? Does this make sense looking at
the boxplot graph above?


# Generalised Linear Mixed Models

Mixed effects models can also be considered within the context of generalised 
linear models. In this case, the linear predictor of observation $i$, $\eta_i$,
can be defined as

$$
\eta_i = X_{ij}\beta +\phi_i
$$

Compared to the previous setting of linear mixed effects models, note that now
the distribution of the response could be other than Gaussian and that
observations are not necessarily nested within groups.

# Poisson regression

In this practical we will use the North Carolina Sudden Infant Death Syndrome
(SIDS) data set. It is available in the `spData` package and it can be loaded
using:

```{r message = FALSE, warning = FALSE}
library(spData)
data(nc.sids)
summary(nc.sids)
```

A full description of the data set is provided in the associated manual page
(check with `?nc.sids`) but in this practical we will only consider these
variables:

* `BIR74`,  number of births (1974-78).

* `SID74`,  number of SID deaths (1974-78).

* `NWBIR74`, number of non-white births (1974-78).

These variables are measured at the county level in North Carolina, of which
there are 100.

Because `SID74` records the number of SID deaths, the model is Poisson:

$$
O_i \mid \mu_i \sim Po(\mu_i),\ i=1,\ldots, 100
$$
Here, $O_i$ represents the number of cases in county $i$ and $\mu_i$ the mean.
In addition, mean $\mu_i$ will be written as $\mu_i = E_i \theta_i$, where $E_i$
is the *expected* number of cases and $\theta_i$ the relative risk in county $i$.

The relative risk $\theta_i$ is often modelled, on the log-scale, to be equal
to a linear predictor:

$$
\log(\theta_i) = \beta_0 + \ldots
$$

The expected number of cases is computed by multiplying the number of births in
county $i$ to the overall mortality rate

$$
r = \frac{\sum_{i=1}^{100}O_i}{\sum_{i=1}^{100}B_i}
$$
where $B_i$ represents the total number of births in country $i$. Hence,
the expected number of cases in county $i$ is $E_i = r B_i$.


```{r}
# Overall mortality rate
r74 <- sum(nc.sids$SID74) / sum(nc.sids$BIR74)
# Expected cases
nc.sids$EXP74 <- r74 * nc.sids$BIR74
```

A common measure of relative risk is the *standardised mortality ratio*
($O_i / E_i$):

```{r}
nc.sids$SMR74 <- nc.sids$SID74 / nc.sids$EXP74
```

A summary of the SMR can be obtained:

```{r fig = TRUE}
hist(nc.sids$SMR, xlab = "SMR")
```

Values above 1 indicate that the county has more observed deaths than expected
and that there might be an increased risk in the area.


As a covariate, we will compute the proportion of non-white births:

```{r}
nc.sids$NWPROP74 <- nc.sids$NWBIR74/ nc.sids$BIR74
```

There is a clear relationship between the SMR and the proportion of non-white
births in a county:

```{r fig = TRUE}
plot(nc.sids$NWPROP74, nc.sids$SMR74)

# Correlation
cor(nc.sids$NWPROP74, nc.sids$SMR74)
```


A simple Poisson regression can be fit by using the following code:

```{r}
m1nc <- bayesx(
  SID74 ~ 1 + NWPROP74,
  family = "poisson",
  offset = log(nc.sids$EXP74),
  data = nc.sids
)
summary(m1nc)
```

Random effects can also be included to account for intrinsic differences between
the counties:

```{r}
# Index for random effects
nc.sids$ID <- 1:nrow(nc.sids)

# Model WITH covariate
m2nc <- bayesx(
  SID74 ~  1 + NWPROP74 + sx(ID, bs = "re"),
  family = "poisson",
  offset = log(nc.sids$EXP74),
  data = as.data.frame(nc.sids)
)

summary(m2nc)
```

We can see the fitted relationship with `NWPROP74`:
```{r fig = TRUE}
x.predict <- seq(0,1,length=1000)
y.predict <- exp(coef(m2nc)["(Intercept)","Mean"]+coef(m2nc)["NWPROP74","Mean"]*x.predict)
par(mfrow = c(1, 1))
plot(nc.sids$NWPROP74, nc.sids$SMR74)
lines(x.predict,y.predict)
```

The role of the covariate can be explored by fitting a model without it:

```{r}
# Model WITHOUT covariate
m3nc <- bayesx(
  SID74 ~  1 + sx(ID, bs = "re"),
  family = "poisson",
  offset = log(nc.sids$EXP74),
  data = as.data.frame(nc.sids)
)

summary(m3nc)
```

Now, notice the decrease in the estimate of the precision of the random effects
(i.e., the variance increases). This means that values of the random effects
are now larger than in the previous case as the random effects pick some of the
effect explained by the covariate.


```{r fig = TRUE}
par(mfrow = c(1, 2))
boxplot(m2nc$effects$`sx(ID):re`$Mean, ylim = c(-1, 1), main = "With NWPROP74")
boxplot(m3nc$effects$`sx(ID):re`$Mean, ylim = c(-1, 1), main = "Without NWPROP74")
```

# Further Extensions

Spatial random effects can be defined not to be independent and identically
distributed. Instead, spatial or temporal correlation can be considered when
defining them. For example, in the North Carolina SIDS data set, it is common
to consider that two counties that are neighbours (i.e., share a boundary)
will have similar relative risks. This can be taken into account in the model
but assuming that the random effects are spatially autocorrelated. This is out
of the scope of this introductory course but feel free to ask about this!!

# Final Exercises (Optional!!)

We tend to use the final Practical session for catching up and to give you the
opportunity to ask us any questions you may have. So please ask away!

If you would like some further exercises to practice, please look at Practical
8 for more material and examples, including the use of the non-MCMC package
`INLA`.