- radioactive gas that enters homes through the ground
- radon levels vary greatly from household to household
- carcinogenic
- mediocre yelp reviews
df = read.csv("radon/data.csv")
str(df)
'data.frame': 919 obs. of 4 variables: $ log_radon : num 0.8329 0.8329 1.0986 0.0953 1.1632 ... $ basement : int 0 1 1 1 1 1 1 1 1 1 ... $ county : int 1 1 1 1 2 2 2 2 2 2 ... $ log_uranium: num -0.689 -0.689 -0.689 -0.689 -0.847 ...
\(y_i \sim \mathcal{N}(\alpha + \beta x_i, \sigma)\) translates to:
data {
  int<lower=1> N;
  vector[N] basement;
  vector[N] log_radon;
}
parameters {
  real basement_effect;
  real<lower=0> sigma;
  real alpha;
}
model {
  log_radon ~ normal(alpha + basement_effect * basement, sigma);
}
With vector[N] basement;
log_radon ~ normal(alpha + basement_effect * basement, sigma);
is equivalent to, and faster than
  for (i in 1:N)
    log_radon[i] ~ normal(basement_effect * basement[i] + a, sigma);
By default, Stan will act as if there is a uniform prior on any parameters not given explicit distributions. We can do better.
Let’s choose \(\mathcal{N}(0, 5)\) for all of these. This gives us
model {
  log_radon ~ normal(alpha + basement_effect * basement, sigma);
  alpha ~ normal(0, 5);
  basement_effect ~ normal(0, 5);
  sigma ~ normal(0, 5);
}
Inference for Stan model: pooled.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.
                mean se_mean   sd 2.5%  50% 97.5% n_eff Rhat
basement_effect 0.59       0 0.07 0.46 0.59  0.72  1626    1
sigma           0.79       0 0.02 0.76 0.79  0.83  2085    1
alpha           0.78       0 0.06 0.66 0.78  0.90  1655    1
Samples were drawn using NUTS(diag_e) at Tue May  1 17:50:16 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
Complete pooling: 
              No pooling: 
J and county to the sampling call.\(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i, \, \sigma)\)
data {
  int<lower=0> N;                 // number of houses
  int<lower=0> J;                 // number of counties
  vector[N] basement;             // 0 = basement, 1 = not basement
  int<lower=1,upper=J> county[N]; // county id for each house
  vector[N] log_radon;            // radon measurement
}
data {
  int<lower=0> N;                 // number of houses
  int<lower=0> J;                 // number of counties
  vector[N] basement;             // 0 = basement, 1 = not basement
  int<lower=1,upper=J> county[N]; // county id for each house
  vector[N] log_radon;            // radon measurement
}
parameters {
  real basement_effect;
  real<lower=0> sigma;
  vector[J] a;
}
model {
  log_radon ~ normal(basement_effect * basement + a[county], sigma);
  a ~ normal(0, 5);
  basement_effect ~ normal(0, 5);
  sigma ~ normal(0, 5);
}
county sd 67 67 0.09091059 26 26 0.09507768 43 43 0.73486469 82 82 0.73924232
      No pooling: 
Hierarchical: 
\(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i, \, \sigma)\)
\(\alpha_{j} \sim \mathcal{N}(\mu_\alpha, \sigma_\alpha)\)
\(\mu_\alpha, \sigma_\alpha \sim \mathcal{N}(0, 5)\)
data {
  int<lower=0> N;                 // number of houses
  int<lower=0> J;                 // number of counties
  vector[N] basement;             // 0 = basement, 1 = not basement
  int<lower=1,upper=J> county[N]; // county id for each house
  vector[N] log_radon;            // radon measurement
}
parameters {
  real basement_effect;
  real<lower=0> sigma;
  real<lower=0> sigma_a;
  real mu_a;
  vector[J] a;
}
model {
  mu_a ~ normal(0, 5);
  sigma_a ~ normal(0, 5);
  a ~ normal(mu_a, sigma_a);
  basement_effect ~ normal(0, 5);
  sigma ~ normal(0, 5);
  log_radon ~ normal(a[county] + basement_effect * basement, sigma);
}
Inference for Stan model: partial_cp0.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.
                mean se_mean   sd 2.5%  50% 97.5% n_eff Rhat
a[67]           0.28    0.00 0.09 0.10 0.29  0.46  1469    1
a[82]           0.96    0.01 0.32 0.33 0.96  1.59  4000    1
a[50]           1.00    0.00 0.29 0.44 0.99  1.59  4000    1
sigma           0.73    0.00 0.02 0.69 0.73  0.76  4000    1
basement_effect 0.66    0.00 0.07 0.53 0.66  0.79   828    1
Samples were drawn using NUTS(diag_e) at Tue May  1 17:50:47 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
Inference for Stan model: unpooled.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.
                mean se_mean   sd 2.5%  50% 97.5% n_eff Rhat
a[67]           0.22    0.00 0.09 0.04 0.22  0.40  2108    1
a[82]           1.51    0.01 0.74 0.10 1.50  2.96  4000    1
a[50]           1.76    0.01 0.72 0.36 1.76  3.15  4000    1
sigma           0.73    0.00 0.02 0.69 0.73  0.76  4000    1
basement_effect 0.70    0.00 0.07 0.56 0.70  0.84  1403    1
Samples were drawn using NUTS(diag_e) at Tue May  1 17:50:35 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
generated quantities {
  vector[N] yppc;
  for (i in 1:N)
    yppc[i] = normal_rng(a[county[i]] + basement_effect * basement[i], sigma);
}
launch_shinystan(unpooled.fit)
launch_shinystan(partial.cp0.fit)
Adding a county-level uranium covariate gives us
\(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i + \gamma u_j, \, \sigma)\)
We can use the same weak prior:
\(\gamma \sim \mathcal{N}(0, 5)\)
vector[N] u; to your data blocku = df$log_uranium in your call to sampling like the rest of the datadata {
  int<lower=0> N;
  int<lower=0> J;
  vector[N] log_radon;
  vector[N] basement;
  int county[N];
  vector[N] u;
}
u_effectmodel {
  mu_a ~ normal(0, 100);
  sigma ~ normal(0, 5);
  sigma_a ~ normal(0, 20);
  a ~ normal(mu_a, sigma_a);
  u_effect ~ normal(0, 5);
  basement_effect ~ normal(0, 2);
  log_radon ~ normal(a[county] + basement_effect * basement + u_effect * u, sigma);
}
Warning: There were 58 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Warning: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See http://mc-stan.org/misc/warnings.html#bfmi-low
Warning: Examine the pairs() plot to diagnose sampling problems
pairs(group.hier.fit, pars=c('sigma', 'sigma_a', 'mu_a', 'u_effect', 'a[1]'))
launch_shinystan(group.hier.fit)
\(\mathrm{log\, y} \sim \mathcal{N}(0, 1.5)\)
\(\mathrm{x}_i \sim \mathcal{N}(0, \mathrm{y})\)
adapt_delta makes HMC more carefuladapt_delta ends up decreasing step sizecontrol=list(adapt_delta=0.99) or control=dict(adapt_delta=0.99)adapt_delta but we still get BFMI warningsCan convert any normal to a standardized normal
\(\mathrm{X} \sim \mathcal{N}(\mu, \sigma)\)
\(\mathrm{Z} = \frac{\mathrm{X} - \mu}{\sigma}\)
\(\mathrm{Z} \sim \mathcal{N}(0, 1)\)
\(\sigma_\alpha \sim \mathcal{N}(0, 5)\)
\(\mu_\alpha \sim \mathcal{N}(0, 5)\)
\(\alpha_j^{std} \sim \mathcal{N}(0, 1)\)
\(\alpha_j = \mu_\alpha + \sigma_\alpha \cdot \alpha_j^{std}\)
parameters {
  real mu_a;
  real<lower=0> sigma_a;
  vector[J] a_std;
  real basement_effect;
  real u_effect;
  real<lower=0> sigma;
}
transformed parameters {
  vector[J] a = mu_a + sigma_a * a_std;
}
model {
  log_radon ~ normal(a[county] + basement_effect * basement + u_effect * u, sigma);
  mu_a ~ normal(0, 5);
  sigma_a ~ normal(0, 5);
  a_std ~ normal(0, 1);
  sigma ~ normal(0, 5);