- radioactive gas that enters homes through the ground
- radon levels vary greatly from household to household
- carcinogenic
- mediocre yelp reviews
df = read.csv("radon/data.csv") str(df)
'data.frame': 919 obs. of 4 variables: $ log_radon : num 0.8329 0.8329 1.0986 0.0953 1.1632 ... $ basement : int 0 1 1 1 1 1 1 1 1 1 ... $ county : int 1 1 1 1 2 2 2 2 2 2 ... $ log_uranium: num -0.689 -0.689 -0.689 -0.689 -0.847 ...
\(y_i \sim \mathcal{N}(\alpha + \beta x_i, \sigma)\) translates to:
data { int<lower=1> N; vector[N] basement; vector[N] log_radon; } parameters { real basement_effect; real<lower=0> sigma; real alpha; } model { log_radon ~ normal(alpha + basement_effect * basement, sigma); }
With vector[N] basement;
log_radon ~ normal(alpha + basement_effect * basement, sigma);
is equivalent to, and faster than
for (i in 1:N) log_radon[i] ~ normal(basement_effect * basement[i] + a, sigma);
By default, Stan will act as if there is a uniform prior on any parameters not given explicit distributions. We can do better.
Let’s choose \(\mathcal{N}(0, 5)\) for all of these. This gives us
model { log_radon ~ normal(alpha + basement_effect * basement, sigma); alpha ~ normal(0, 5); basement_effect ~ normal(0, 5); sigma ~ normal(0, 5); }
Inference for Stan model: pooled. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 50% 97.5% n_eff Rhat basement_effect 0.59 0 0.07 0.46 0.59 0.72 1626 1 sigma 0.79 0 0.02 0.76 0.79 0.83 2085 1 alpha 0.78 0 0.06 0.66 0.78 0.90 1655 1 Samples were drawn using NUTS(diag_e) at Tue May 1 17:50:16 2018. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
Complete pooling:
       No pooling:
J
and county
to the sampling
call.\(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i, \, \sigma)\)
data { int<lower=0> N; // number of houses int<lower=0> J; // number of counties vector[N] basement; // 0 = basement, 1 = not basement int<lower=1,upper=J> county[N]; // county id for each house vector[N] log_radon; // radon measurement }
data { int<lower=0> N; // number of houses int<lower=0> J; // number of counties vector[N] basement; // 0 = basement, 1 = not basement int<lower=1,upper=J> county[N]; // county id for each house vector[N] log_radon; // radon measurement } parameters { real basement_effect; real<lower=0> sigma; vector[J] a; } model { log_radon ~ normal(basement_effect * basement + a[county], sigma); a ~ normal(0, 5); basement_effect ~ normal(0, 5); sigma ~ normal(0, 5); }
county sd 67 67 0.09091059 26 26 0.09507768 43 43 0.73486469 82 82 0.73924232
   No pooling:
Hierarchical:
\(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i, \, \sigma)\)
\(\alpha_{j} \sim \mathcal{N}(\mu_\alpha, \sigma_\alpha)\)
\(\mu_\alpha, \sigma_\alpha \sim \mathcal{N}(0, 5)\)
data { int<lower=0> N; // number of houses int<lower=0> J; // number of counties vector[N] basement; // 0 = basement, 1 = not basement int<lower=1,upper=J> county[N]; // county id for each house vector[N] log_radon; // radon measurement }
parameters { real basement_effect; real<lower=0> sigma; real<lower=0> sigma_a; real mu_a; vector[J] a; } model { mu_a ~ normal(0, 5); sigma_a ~ normal(0, 5); a ~ normal(mu_a, sigma_a); basement_effect ~ normal(0, 5); sigma ~ normal(0, 5); log_radon ~ normal(a[county] + basement_effect * basement, sigma); }
Inference for Stan model: partial_cp0. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 50% 97.5% n_eff Rhat a[67] 0.28 0.00 0.09 0.10 0.29 0.46 1469 1 a[82] 0.96 0.01 0.32 0.33 0.96 1.59 4000 1 a[50] 1.00 0.00 0.29 0.44 0.99 1.59 4000 1 sigma 0.73 0.00 0.02 0.69 0.73 0.76 4000 1 basement_effect 0.66 0.00 0.07 0.53 0.66 0.79 828 1 Samples were drawn using NUTS(diag_e) at Tue May 1 17:50:47 2018. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
Inference for Stan model: unpooled. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 50% 97.5% n_eff Rhat a[67] 0.22 0.00 0.09 0.04 0.22 0.40 2108 1 a[82] 1.51 0.01 0.74 0.10 1.50 2.96 4000 1 a[50] 1.76 0.01 0.72 0.36 1.76 3.15 4000 1 sigma 0.73 0.00 0.02 0.69 0.73 0.76 4000 1 basement_effect 0.70 0.00 0.07 0.56 0.70 0.84 1403 1 Samples were drawn using NUTS(diag_e) at Tue May 1 17:50:35 2018. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
generated quantities { vector[N] yppc; for (i in 1:N) yppc[i] = normal_rng(a[county[i]] + basement_effect * basement[i], sigma); }
launch_shinystan(unpooled.fit)
launch_shinystan(partial.cp0.fit)
Adding a county-level uranium covariate gives us
\(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i + \gamma u_j, \, \sigma)\)
We can use the same weak prior:
\(\gamma \sim \mathcal{N}(0, 5)\)
vector[N] u;
to your data blocku = df$log_uranium
in your call to sampling
like the rest of the datadata { int<lower=0> N; int<lower=0> J; vector[N] log_radon; vector[N] basement; int county[N]; vector[N] u; }
u_effect
model { mu_a ~ normal(0, 100); sigma ~ normal(0, 5); sigma_a ~ normal(0, 20); a ~ normal(mu_a, sigma_a); u_effect ~ normal(0, 5); basement_effect ~ normal(0, 2); log_radon ~ normal(a[county] + basement_effect * basement + u_effect * u, sigma); }
Warning: There were 58 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Warning: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See http://mc-stan.org/misc/warnings.html#bfmi-low
Warning: Examine the pairs() plot to diagnose sampling problems
pairs(group.hier.fit, pars=c('sigma', 'sigma_a', 'mu_a', 'u_effect', 'a[1]'))
launch_shinystan(group.hier.fit)
\(\mathrm{log\, y} \sim \mathcal{N}(0, 1.5)\)
\(\mathrm{x}_i \sim \mathcal{N}(0, \mathrm{y})\)
adapt_delta
makes HMC more carefuladapt_delta
ends up decreasing step sizecontrol=list(adapt_delta=0.99)
or control=dict(adapt_delta=0.99)
adapt_delta
but we still get BFMI warningsCan convert any normal to a standardized normal
\(\mathrm{X} \sim \mathcal{N}(\mu, \sigma)\)
\(\mathrm{Z} = \frac{\mathrm{X} - \mu}{\sigma}\)
\(\mathrm{Z} \sim \mathcal{N}(0, 1)\)
\(\sigma_\alpha \sim \mathcal{N}(0, 5)\)
\(\mu_\alpha \sim \mathcal{N}(0, 5)\)
\(\alpha_j^{std} \sim \mathcal{N}(0, 1)\)
\(\alpha_j = \mu_\alpha + \sigma_\alpha \cdot \alpha_j^{std}\)
parameters { real mu_a; real<lower=0> sigma_a; vector[J] a_std; real basement_effect; real u_effect;
real<lower=0> sigma; } transformed parameters { vector[J] a = mu_a + sigma_a * a_std; } model { log_radon ~ normal(a[county] + basement_effect * basement + u_effect * u, sigma); mu_a ~ normal(0, 5); sigma_a ~ normal(0, 5); a_std ~ normal(0, 1); sigma ~ normal(0, 5);