- radioactive gas that enters homes through the ground
- radon levels vary greatly from household to household
- carcinogenic
- mediocre yelp reviews
df = read.csv("radon/data.csv")
str(df)
'data.frame': 919 obs. of 4 variables: $ log_radon : num 0.8329 0.8329 1.0986 0.0953 1.1632 ... $ basement : int 0 1 1 1 1 1 1 1 1 1 ... $ county : int 1 1 1 1 2 2 2 2 2 2 ... $ log_uranium: num -0.689 -0.689 -0.689 -0.689 -0.847 ...
\(y_i \sim \mathcal{N}(\alpha + \beta x_i, \sigma)\) translates to:
data {
int<lower=1> N;
vector[N] basement;
vector[N] log_radon;
}
parameters {
real basement_effect;
real<lower=0> sigma;
real alpha;
}
model {
log_radon ~ normal(alpha + basement_effect * basement, sigma);
}
With vector[N] basement;
log_radon ~ normal(alpha + basement_effect * basement, sigma);
is equivalent to, and faster than
for (i in 1:N)
log_radon[i] ~ normal(basement_effect * basement[i] + a, sigma);
By default, Stan will act as if there is a uniform prior on any parameters not given explicit distributions. We can do better.
Let’s choose \(\mathcal{N}(0, 5)\) for all of these. This gives us
model {
log_radon ~ normal(alpha + basement_effect * basement, sigma);
alpha ~ normal(0, 5);
basement_effect ~ normal(0, 5);
sigma ~ normal(0, 5);
}
Inference for Stan model: pooled.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
basement_effect 0.59 0 0.07 0.46 0.59 0.72 1626 1
sigma 0.79 0 0.02 0.76 0.79 0.83 2085 1
alpha 0.78 0 0.06 0.66 0.78 0.90 1655 1
Samples were drawn using NUTS(diag_e) at Tue May 1 17:50:16 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Complete pooling:
       No pooling:
J and county to the sampling call.\(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i, \, \sigma)\)
data {
int<lower=0> N; // number of houses
int<lower=0> J; // number of counties
vector[N] basement; // 0 = basement, 1 = not basement
int<lower=1,upper=J> county[N]; // county id for each house
vector[N] log_radon; // radon measurement
}
data {
int<lower=0> N; // number of houses
int<lower=0> J; // number of counties
vector[N] basement; // 0 = basement, 1 = not basement
int<lower=1,upper=J> county[N]; // county id for each house
vector[N] log_radon; // radon measurement
}
parameters {
real basement_effect;
real<lower=0> sigma;
vector[J] a;
}
model {
log_radon ~ normal(basement_effect * basement + a[county], sigma);
a ~ normal(0, 5);
basement_effect ~ normal(0, 5);
sigma ~ normal(0, 5);
}
county sd 67 67 0.09091059 26 26 0.09507768 43 43 0.73486469 82 82 0.73924232
   No pooling:
Hierarchical:
\(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i, \, \sigma)\)
\(\alpha_{j} \sim \mathcal{N}(\mu_\alpha, \sigma_\alpha)\)
\(\mu_\alpha, \sigma_\alpha \sim \mathcal{N}(0, 5)\)
data {
int<lower=0> N; // number of houses
int<lower=0> J; // number of counties
vector[N] basement; // 0 = basement, 1 = not basement
int<lower=1,upper=J> county[N]; // county id for each house
vector[N] log_radon; // radon measurement
}
parameters {
real basement_effect;
real<lower=0> sigma;
real<lower=0> sigma_a;
real mu_a;
vector[J] a;
}
model {
mu_a ~ normal(0, 5);
sigma_a ~ normal(0, 5);
a ~ normal(mu_a, sigma_a);
basement_effect ~ normal(0, 5);
sigma ~ normal(0, 5);
log_radon ~ normal(a[county] + basement_effect * basement, sigma);
}
Inference for Stan model: partial_cp0.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
a[67] 0.28 0.00 0.09 0.10 0.29 0.46 1469 1
a[82] 0.96 0.01 0.32 0.33 0.96 1.59 4000 1
a[50] 1.00 0.00 0.29 0.44 0.99 1.59 4000 1
sigma 0.73 0.00 0.02 0.69 0.73 0.76 4000 1
basement_effect 0.66 0.00 0.07 0.53 0.66 0.79 828 1
Samples were drawn using NUTS(diag_e) at Tue May 1 17:50:47 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Inference for Stan model: unpooled.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
a[67] 0.22 0.00 0.09 0.04 0.22 0.40 2108 1
a[82] 1.51 0.01 0.74 0.10 1.50 2.96 4000 1
a[50] 1.76 0.01 0.72 0.36 1.76 3.15 4000 1
sigma 0.73 0.00 0.02 0.69 0.73 0.76 4000 1
basement_effect 0.70 0.00 0.07 0.56 0.70 0.84 1403 1
Samples were drawn using NUTS(diag_e) at Tue May 1 17:50:35 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
generated quantities {
vector[N] yppc;
for (i in 1:N)
yppc[i] = normal_rng(a[county[i]] + basement_effect * basement[i], sigma);
}
launch_shinystan(unpooled.fit)
launch_shinystan(partial.cp0.fit)
Adding a county-level uranium covariate gives us
\(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i + \gamma u_j, \, \sigma)\)
We can use the same weak prior:
\(\gamma \sim \mathcal{N}(0, 5)\)
vector[N] u; to your data blocku = df$log_uranium in your call to sampling like the rest of the datadata {
int<lower=0> N;
int<lower=0> J;
vector[N] log_radon;
vector[N] basement;
int county[N];
vector[N] u;
}
u_effectmodel {
mu_a ~ normal(0, 100);
sigma ~ normal(0, 5);
sigma_a ~ normal(0, 20);
a ~ normal(mu_a, sigma_a);
u_effect ~ normal(0, 5);
basement_effect ~ normal(0, 2);
log_radon ~ normal(a[county] + basement_effect * basement + u_effect * u, sigma);
}
Warning: There were 58 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Warning: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See http://mc-stan.org/misc/warnings.html#bfmi-low
Warning: Examine the pairs() plot to diagnose sampling problems
pairs(group.hier.fit, pars=c('sigma', 'sigma_a', 'mu_a', 'u_effect', 'a[1]'))
launch_shinystan(group.hier.fit)
\(\mathrm{log\, y} \sim \mathcal{N}(0, 1.5)\)
\(\mathrm{x}_i \sim \mathcal{N}(0, \mathrm{y})\)
adapt_delta makes HMC more carefuladapt_delta ends up decreasing step sizecontrol=list(adapt_delta=0.99) or control=dict(adapt_delta=0.99)adapt_delta but we still get BFMI warningsCan convert any normal to a standardized normal
\(\mathrm{X} \sim \mathcal{N}(\mu, \sigma)\)
\(\mathrm{Z} = \frac{\mathrm{X} - \mu}{\sigma}\)
\(\mathrm{Z} \sim \mathcal{N}(0, 1)\)
\(\sigma_\alpha \sim \mathcal{N}(0, 5)\)
\(\mu_\alpha \sim \mathcal{N}(0, 5)\)
\(\alpha_j^{std} \sim \mathcal{N}(0, 1)\)
\(\alpha_j = \mu_\alpha + \sigma_\alpha \cdot \alpha_j^{std}\)
parameters {
real mu_a;
real<lower=0> sigma_a;
vector[J] a_std;
real basement_effect;
real u_effect;
real<lower=0> sigma;
}
transformed parameters {
vector[J] a = mu_a + sigma_a * a_std;
}
model {
log_radon ~ normal(a[county] + basement_effect * basement + u_effect * u, sigma);
mu_a ~ normal(0, 5);
sigma_a ~ normal(0, 5);
a_std ~ normal(0, 1);
sigma ~ normal(0, 5);