Hamiltonian Monte Carlo with Turing.jl
In this section, we will use the Turing.jl
package to model the inference problem of Gaussian process regression. We advise the reader to check the Turing.jl documentation for a more detailed explanation of the package and its capabilities.
Modelling function
In Turing
we can write an observation model with the @model
as follows:
using Turing
using Pioran
@model function inference_model(y, t, σ)
# Prior distribution for the parameters
α₁ ~ Uniform(0., 1.25)
f₁ ~ LogUniform(min_f_b, max_f_b)
α₂ ~ Uniform(1, 4)
variance ~ LogNormal(log(0.5), 1.25)
ν ~ Gamma(2, 0.5)
μ ~ Normal(0, 2)
# Rescale the measurement variance
σ² = ν .* σ .^ 2
# Define power spectral density function
𝓟 = SingleBendingPowerLaw(α₁, f₁, α₂)
# Approximation of the PSD to form a covariance function
𝓡 = approx(𝓟, f0, fM, 20, variance)
# Build the GP
f = ScalableGP(μ,𝓡)
# sample the conditioned distribution
return y ~ f(t, σ²) # <- this means that our data y is distributed according
# to the GP f conditioned with input t and variance σ²
end
inference_model (generic function with 2 methods)
The order of the parameters in the @model
block is important, first we define the prior distribution for the parameters, then we rescale the measurement variance and define the power spectral density function and its approximation to form a covariance function. Finally, we build the Gaussian process.
The last line says that the data y
is distributed according to the Gaussian process f
conditioned with input time t
and measurement variance σ²
.
Prior distributions
In practice, if we have several slopes $\alpha_i$ and frequencies $f_i$, we would like to order them such that $\alpha_i < \alpha_{i+1}$ and $f_i< f_{i+1}$. However, in Turing.jl
it is not yet possible to sample from distributions with dynamic support, see these issues [1],[2],[3].
The solution proposed in these issues is to define a custom multivariate distribution with bijectors to map from the constrained space to the unconstrained space.
Here we provide three distributions: TwoUniformDependent
, ThreeUniformDependent
and TwoLogUniformDependent
.
In a double-bending power-law model they can be used as follows:
@model function inference_model(y, t, σ)
α ~ ThreeUniformDependent(0, 1.25, 4)
α₁, α₂, α₃ = α
fb ~ TwoLogUniform(min_f_b, max_f_b)
f₁, f₂ = fb
variance ~ LogNormal(log(0.5), 1.25)
ν ~ Gamma(2, 0.5)
μ ~ LogNormal(log(3), 1)
σ² = ν .* σ .^ 2
𝓟 = DoubleBendingPowerLaw(α₁, f₁, α₂, f₂, α₃)
𝓡 = approx(𝓟, f0, fM, 20, variance)
f = ScalableGP(μ, 𝓡)
return y ~ f(t, σ²)
end
Sampling
Once the model is defined, we can choose a sampler. Here we use the No-U-Turn Sampler (NUTS) which is a variant of Hamiltonian Monte Carlo (HMC). In Turing.jl
we can define the sampler with $0.8$ as target acceptance probability as follows:
sampler = NUTS(0.8)
or we can use the AdvancedHMC.jl
package to define the sampler as follows:
using AdvancedHMC
tap = 0.8 #target acceptance probability
nuts = AdvancedHMC.NUTS(tap)
sampler = externalsampler(nuts)
For more information on the AdvancedHMC.jl
package, see the documentation. This allows more control over the sampler, for instance the choice of metric.
We can then sample 2000
points from the posterior distribution using the sample
function. We can also specify the number of adaptation steps with the n_adapts
keyword argument. The progress
keyword argument is used to display the progress of the sampling process.
mychain = sample(inference_model(y, t, yerr), sampler, 2000; n_adapts=1000, progress=true)
Sampling several chains
In practice, we may want to sample from the posterior distribution using multiple chains which is essential for convergence diagnostics $\hat{R}$ and the ESS (effective sample size). We can use the Distributed
package or the MCMCDistributed()
function to sample from the posterior distribution using multiple chains. For a more detailed explanation see the Turing.jl Guide here.
With Distributed.jl
We can use the Distributed
package to sample from the posterior distribution using multiple chains as follows:
using Distributed
using Turing
num_chains = nworkers();
@everywhere filename = $(ARGS[1]);
@everywhere begin
using Turing
using MCMCChains
using AdvancedHMC
using Pioran
using DelimitedFiles
data = readdlm(filename, comments=true)
t, y, yerr = ...
# do something
end
@everywhere @model function ...
# Define the model here
end
@everywhere begin
sampler = ...
end
HMCchains = pmap(c -> sample(inference_model(y, t, yerr), sampler, 2000; n_adapts=1000, progress=true), 1:num_chains)
total_chainHMC = chainscat(HMCchains...) # concatenate the chains
This script is run with the following command:
julia -p 6 script.jl data.txt
Where 6
is the number of chains and data.txt
is the file containing the time series.
With MCMCDistributed()
When using the MCMCDistributed()
function, the script is essentially the same, only the last two lines are replaced by:
total_chainHMC = sample(GP_inference(y, t, yerr), sampler, MCMCDistributed(),2000,num_chains, n_adapts=n_adapts, progress=true)
Saving the chains
As mentioned in the documentation
of the MCMCChains
package, we can save the chains using MCMCChainsStorage
as follows:
using HDF5
using MCMCChains
using MCMCChainsStorage
total_chainHMC = sample(GP_inference(y, t, yerr), sampler, MCMCDistributed(),2000,num_chains, n_adapts=n_adapts, progress=true)
h5open("total_chain.h5", "w") do file
write(file, total_chainHMC)
end
Example
Here is an example of a script which can be found in the example directory of the Pioran.jl
package. This script is used to sample from the posterior distribution using multiple chains.
# to run with 6 workers: julia -p 6 single_pl.jl data/simu.txt
using Distributed
using Turing
using HDF5
using MCMCChains
using MCMCChainsStorage
num_chains = nworkers();
@everywhere filename = $(ARGS[1]);
@everywhere begin
using Turing
using MCMCChains
using AdvancedHMC
using Pioran
using DelimitedFiles
fname = replace(split(filename, "/")[end], ".txt" => "_single")
dir = "inference/" * fname
data = readdlm(filename, comments=true)
t, y, yerr = data[:, 1], data[:, 2], data[:, 3]
# Frequency range for the approx and the prior
f_min, f_max = 1 / (t[end] - t[1]), 1 / minimum(diff(t)) / 2
f0, fM = f_min / 20.0, f_max * 20.0
min_f_b, max_f_b = f0 * 4.0, fM / 4.0
# F var^2 is distributed as a log-normal
μᵥ, σᵥ = -1.5, 1.0
μₙ, σₙ² = 2μᵥ, 2(σᵥ)^2
σₙ = sqrt(σₙ²)
# options for the approximation
basis_function = "SHO"
n_components = 20
model = SingleBendingPowerLaw
prior_checks = true
end
@everywhere @model function inference_model(y, t, σ)
# Prior distribution for the parameters
α₁ ~ Uniform(0.0, 1.25)
f₁ ~ LogUniform(min_f_b, max_f_b)
α₂ ~ Uniform(1, 4)
variance ~ LogNormal(log(0.5), 1.25)
ν ~ Gamma(2, 0.5)
μ ~ Normal(0, 2)
c ~ LogUniform(1e-6, minimum(y) * 0.99)
# Rescale the measurement variance
σ² = ν .* σ .^ 2 ./ (y .- c) .^ 2
# Make the flux Gaussian
y = log.(y .- c)
# Define power spectral density function
𝓟 = model(α₁, f₁, α₂)
# Approximation of the PSD to form a covariance function
𝓡 = approx(𝓟, f0, fM, n_components, variance, basis_function=basis_function)
# Build the GP
f = ScalableGP(μ, 𝓡)
# sample the conditioned distribution
return y ~ f(t, σ²) # <- this means that our data y is distributed according
# to the GP f conditioned with input t and variance σ²
end
@everywhere begin
n_adapts = 500 # number of adaptation steps
tap = 0.8 #target acceptance probability
nuts = AdvancedHMC.NUTS(tap)
end
# either
# HMCchains = sample(GP_inference(y, t, yerr), externalsampler(nuts), MCMCDistributed(),1000,num_chains, n_adapts=n_adapts, progress=true)
# or
HMCchains = pmap(c -> sample(inference_model(y, t, yerr), externalsampler(nuts), 1000; n_adapts=n_adapts,save_state=true, progress=true), 1:num_chains)
total_chainHMC = chainscat(HMCchains...)# not needed in the previous case
if !isdir("inference/")
mkpath("inference/")
end
h5open(dir*".h5", "w") do file
write(file, total_chainHMC)
end
The results of the sampling can be found in the inference
directory. The chains are saved in the h5
format and can be loaded as shown below:
using MCMCChains
using MCMCChainsStorage
using HDF5
total_chain = h5open("data/subset_simu_single.h5", "r") do f
read(f, Chains)
end
total_chain
Chains MCMC chain (1000×20×9 Array{Float64, 3}):
Iterations = 1:1:1000
Number of chains = 9
Samples per chain = 1000
parameters = c, f₁, variance, α₁, α₂, μ, ν
internals = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, is_adapt, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯
c 0.0026 0.0251 0.0008 2302.5962 2565.4883 1.0034 ⋯
f₁ 0.0569 0.5556 0.0263 1231.6547 1492.6846 1.0073 ⋯
variance 0.3183 0.4488 0.0105 2015.4446 2567.1401 1.0039 ⋯
α₁ 0.8357 0.3470 0.0064 2575.4719 2559.6196 1.0016 ⋯
α₂ 2.8681 0.3514 0.0083 1993.6386 1826.6556 1.0048 ⋯
μ 0.2425 0.4385 0.0095 2747.2149 2143.5901 1.0041 ⋯
ν 1.0608 0.2021 0.0042 3981.0973 3547.5757 1.0012 ⋯
1 column omitted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
c 0.0000 0.0000 0.0001 0.0005 0.0067
f₁ 0.0004 0.0018 0.0038 0.0076 0.0192
variance 0.0514 0.1081 0.1771 0.3438 1.4875
α₁ 0.0864 0.5936 0.9353 1.1318 1.2422
α₂ 2.3458 2.6187 2.8075 3.0707 3.7105
μ -0.5953 0.0204 0.2273 0.4440 1.1821
ν 0.7589 0.9418 1.0504 1.1659 1.4209
We can then use the StatsPlots
package to visualize the chains. The first 50 points are discarded.
using StatsPlots
plot(total_chain[50:end,:,:])