Radon

  • radioactive gas that enters homes through the ground
  • radon levels vary greatly from household to household
  • carcinogenic
  • mediocre yelp reviews

Household radon contamination data

  • 80,000 houses
  • Two predictors:
    • measument taken in the basement?
    • county uranium level (positive corr with radon levels)
  • We’ll focus on Minnesota, which has 919 households in 85 counties

Getting to know the data

df = read.csv("radon/data.csv")
str(df)
'data.frame':   919 obs. of  4 variables:
 $ log_radon  : num  0.8329 0.8329 1.0986 0.0953 1.1632 ...
 $ basement   : int  0 1 1 1 1 1 1 1 1 1 ...
 $ county     : int  1 1 1 1 2 2 2 2 2 2 ...
 $ log_uranium: num  -0.689 -0.689 -0.689 -0.689 -0.847 ...

Distribution of radon levels in MN (log scale)

Start simple: complete pooling

  • Estimate a radon level for basement vs non-basement readings, \(y_i \sim \mathcal{N}(\alpha + \beta x_i, \sigma)\)
  • \(\sigma\) is the standard deviation of the error, capturing all of the variance we aren’t modeling

Complete pooling Stan model

\(y_i \sim \mathcal{N}(\alpha + \beta x_i, \sigma)\) translates to:

data {
  int<lower=1> N;
  vector[N] basement;
  vector[N] log_radon;
}
parameters {
  real basement_effect;
  real<lower=0> sigma;
  real alpha;
}
model {
  log_radon ~ normal(alpha + basement_effect * basement, sigma);
}

Intermezzo: vectorization

With vector[N] basement;

  log_radon ~ normal(alpha + basement_effect * basement, sigma);

is equivalent to, and faster than

  for (i in 1:N)
    log_radon[i] ~ normal(basement_effect * basement[i] + a, sigma);

The default priors are not ideal

By default, Stan will act as if there is a uniform prior on any parameters not given explicit distributions. We can do better.

  • log_radon is between -3 and 4 ish
  • Seems reasonable to think the effect of a basement measurement is somewhere in that range
  • Same for intercept
  • \(\sigma\) should capture unexplained variance, so we can use the same range here and be only weakly informative

Pooled model with priors

Let’s choose \(\mathcal{N}(0, 5)\) for all of these. This gives us

model {
  log_radon ~ normal(alpha + basement_effect * basement, sigma);
  alpha ~ normal(0, 5);
  basement_effect ~ normal(0, 5);
  sigma ~ normal(0, 5);
}

Pooled fit in Stan

Inference for Stan model: pooled.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                mean se_mean   sd 2.5%  50% 97.5% n_eff Rhat
basement_effect 0.59       0 0.07 0.46 0.59  0.72  1626    1
sigma           0.79       0 0.02 0.76 0.79  0.83  2085    1
alpha           0.78       0 0.06 0.66 0.78  0.90  1655    1

Samples were drawn using NUTS(diag_e) at Tue May  1 17:50:16 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Pooled fit, visualized

What does this teach us?

  • basement_effect
  • \(\alpha\) + 0 = distribution of non-basement measurements
  • \(\alpha\) + basement_effect = distribution of basement measurements
  • \(\sigma\) = variation
  • What about counties?

Learning about counties

  • Adding county-level intercepts is another conventional approach
  • \(y_i \sim \mathcal{N}(\alpha_{j[i]} + \beta x_i, \sigma)\), where \(j = 1, ..., 85\)
  • As before both \(\sigma\) and \(\beta\) are scalars, so they are shared among the entire population (here, all homes in MN)
  • Each intercept \(\alpha_j\) is independent and inferred based on data from that county alone

Complete pooling:

              No pooling:

Exercise: create the unpooled model

  • Separate intercept for each county
  • Still sharing beta and sigma at the population level
  • Will now need to feed in J and county to the sampling call.

\(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i, \, \sigma)\)

data {
  int<lower=0> N;                 // number of houses
  int<lower=0> J;                 // number of counties
  vector[N] basement;             // 0 = basement, 1 = not basement
  int<lower=1,upper=J> county[N]; // county id for each house
  vector[N] log_radon;            // radon measurement
}

Unpooled Stan model

data {
  int<lower=0> N;                 // number of houses
  int<lower=0> J;                 // number of counties
  vector[N] basement;             // 0 = basement, 1 = not basement
  int<lower=1,upper=J> county[N]; // county id for each house
  vector[N] log_radon;            // radon measurement
}
parameters {
  real basement_effect;
  real<lower=0> sigma;
  vector[J] a;
}
model {
  log_radon ~ normal(basement_effect * basement + a[county], sigma);
  a ~ normal(0, 5);
  basement_effect ~ normal(0, 5);
  sigma ~ normal(0, 5);
}

The most and least certain county estimates

   county         sd
67     67 0.09091059
26     26 0.09507768
43     43 0.73486469
82     82 0.73924232

Distribution of measurements by county

Estimates with county measurement density

How can we share insight across counties?

  • Ideally we want some compromise between complete and no pooling
  • Seems reasonable to ask for a system where there is some shift towards the population distribution
  • That shift should scale inversely with the amount of data we have for that county

Adding hierarchy

  • Principled way of determining how much to pool
  • Pull estimates towards population (state) mean based on amount of variation in population
    • low variance population: more pooling
    • high variance population: less pooling
  • In limit:
    • as variance goes to 0, complete pooling
    • as variance goes to \(\infty\), no pooling

      No pooling:

Hierarchical:

Exercise: adding hierarchy

Let’s take our current model and add hierarchical priors on alpha:

\(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i, \, \sigma)\)

\(\alpha_{j} \sim \mathcal{N}(\mu_\alpha, \sigma_\alpha)\)

\(\mu_\alpha, \sigma_\alpha \sim \mathcal{N}(0, 5)\)

data {
  int<lower=0> N;                 // number of houses
  int<lower=0> J;                 // number of counties
  vector[N] basement;             // 0 = basement, 1 = not basement
  int<lower=1,upper=J> county[N]; // county id for each house
  vector[N] log_radon;            // radon measurement
}

Hierarchical model on counties

parameters {
  real basement_effect;
  real<lower=0> sigma;
  real<lower=0> sigma_a;
  real mu_a;
  vector[J] a;
}
model {
  mu_a ~ normal(0, 5);
  sigma_a ~ normal(0, 5);
  a ~ normal(mu_a, sigma_a);
  basement_effect ~ normal(0, 5);
  sigma ~ normal(0, 5);
  log_radon ~ normal(a[county] + basement_effect * basement, sigma);
}

Hierarchical model fit

Inference for Stan model: partial_cp0.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                mean se_mean   sd 2.5%  50% 97.5% n_eff Rhat
a[67]           0.28    0.00 0.09 0.10 0.29  0.46  1469    1
a[82]           0.96    0.01 0.32 0.33 0.96  1.59  4000    1
a[50]           1.00    0.00 0.29 0.44 0.99  1.59  4000    1
sigma           0.73    0.00 0.02 0.69 0.73  0.76  4000    1
basement_effect 0.66    0.00 0.07 0.53 0.66  0.79   828    1

Samples were drawn using NUTS(diag_e) at Tue May  1 17:50:47 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Unpooled fit for reference:

Inference for Stan model: unpooled.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                mean se_mean   sd 2.5%  50% 97.5% n_eff Rhat
a[67]           0.22    0.00 0.09 0.04 0.22  0.40  2108    1
a[82]           1.51    0.01 0.74 0.10 1.50  2.96  4000    1
a[50]           1.76    0.01 0.72 0.36 1.76  3.15  4000    1
sigma           0.73    0.00 0.02 0.69 0.73  0.76  4000    1
basement_effect 0.70    0.00 0.07 0.56 0.70  0.84  1403    1

Samples were drawn using NUTS(diag_e) at Tue May  1 17:50:35 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Fit lines for each county

Hierarchical fit, visualized

Unpooled fit for reference

Hierarchical vs unpooled

Posterior predictive checks

  • Generate fake measurements to see how well our model reproduces the actual measurements
  • Can use these to do visual model comparison - add this snippet to the end of both the unpooled and hierarchical models
generated quantities {
  vector[N] yppc;
  for (i in 1:N)
    yppc[i] = normal_rng(a[county[i]] + basement_effect * basement[i], sigma);
}

For each measurement, we obtain a distribution

Hierarchical vs unpooled PPC - min house

Hierarchical vs unpooled PPC - max house

Shinystan is useful for PPCs

launch_shinystan(unpooled.fit)
launch_shinystan(partial.cp0.fit)

PPC model comparison with densities

Improving on the model

  • We also have a county-level uranium measurement available
  • This can potentially give us much more information on counties with few measurements
  • Better estimates save money and lives

Group + individual covariates = “multilevel”

Adding a county-level uranium covariate gives us

\(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i + \gamma u_j, \, \sigma)\)

We can use the same weak prior:

\(\gamma \sim \mathcal{N}(0, 5)\)

Exercise: add uranium data and parameters

  • \(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i + \gamma u_j, \, \sigma)\)
  • Add vector[N] u; to your data block
  • Pass in u = df$log_uranium in your call to sampling like the rest of the data
data {
  int<lower=0> N;
  int<lower=0> J;
  vector[N] log_radon;
  vector[N] basement;
  int county[N];
  vector[N] u;
}

Now with a group-level predictor, u_effect

model {
  mu_a ~ normal(0, 100);
  sigma ~ normal(0, 5);
  sigma_a ~ normal(0, 20);
  a ~ normal(mu_a, sigma_a);
  u_effect ~ normal(0, 5);
  basement_effect ~ normal(0, 2);
  log_radon ~ normal(a[county] + basement_effect * basement + u_effect * u, sigma);
}
Warning: There were 58 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Warning: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low
Warning: Examine the pairs() plot to diagnose sampling problems

pairs()

pairs(group.hier.fit, pars=c('sigma', 'sigma_a', 'mu_a', 'u_effect', 'a[1]'))

Shinystan

launch_shinystan(group.hier.fit)

Neal’s funnel in small-data hierarchical params

\(\mathrm{log\, y} \sim \mathcal{N}(0, 1.5)\)

\(\mathrm{x}_i \sim \mathcal{N}(0, \mathrm{y})\)

adapt_delta makes HMC more careful

  • Increasing adapt_delta ends up decreasing step size
  • Helps HMC deal with funnels and other sharp geometric areas
  • But, makes it take much longer to explore larger and less dense areas of the typical set (like the mouth of our funnel)
  • control=list(adapt_delta=0.99) or control=dict(adapt_delta=0.99)
  • In this case, we can get rid of the divergences with adapt_delta but we still get BFMI warnings

Centered vs. non-centered parameterizations

  • Our funnel is caused by tight coupling between hierarchical scale and individual mean in small-data regime
  • Can attack this directly with a “non-centered parameterization,” which decouples the two

Remember z-score tables?

Can convert any normal to a standardized normal

\(\mathrm{X} \sim \mathcal{N}(\mu, \sigma)\)

\(\mathrm{Z} = \frac{\mathrm{X} - \mu}{\sigma}\)

\(\mathrm{Z} \sim \mathcal{N}(0, 1)\)

Exercise: non-center \(\alpha\) our model

\(\sigma_\alpha \sim \mathcal{N}(0, 5)\)

\(\mu_\alpha \sim \mathcal{N}(0, 5)\)

\(\alpha_j^{std} \sim \mathcal{N}(0, 1)\)

\(\alpha_j = \mu_\alpha + \sigma_\alpha \cdot \alpha_j^{std}\)

parameters {
  real mu_a;
  real<lower=0> sigma_a;
  vector[J] a_std;
  real basement_effect;
  real u_effect;

Non-centered parameterization of \(\alpha\)

  real<lower=0> sigma;
}
transformed parameters {
  vector[J] a = mu_a + sigma_a * a_std;
}
model {
  log_radon ~ normal(a[county] + basement_effect * basement + u_effect * u, sigma);
  mu_a ~ normal(0, 5);
  sigma_a ~ normal(0, 5);
  a_std ~ normal(0, 1);
  sigma ~ normal(0, 5);

County uranium fit lines by basement