Open
Description
The Pathfinder implementation fails for any model of moderate size when run on a GPU, quickly running out of GPU memory:
XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 4464000000 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
parameter allocation: 4.16GiB
constant allocation: 0B
maybe_live_out allocation: 4.16GiB
preallocated temp allocation: 0B
total allocation: 8.31GiB
total fragmentation: 0B (0.00%)
Peak buffers:
Buffer 1:
Size: 4.16GiB
Entry Parameter Subshape: f64[31,200,300,300]
==========================
Buffer 2:
Size: 4.16GiB
Operator: op_name="jit(fn)/jit(main)/mul" source_file="/usr/local/lib/python3.10/dist-packages/pytensor/link/jax/dispatch/scalar.py" source_line=103
XLA Label: fusion
Shape: f64[31,200,300,300]
==========================
Buffer 3:
Size: 8B
Entry Parameter Subshape: s64[]
==========================
I've tried setting some appropriate XLA environment variables as follows:
os.environ["XLA_PYTHON_CLIENT_PREALLOCATE"]="false"
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]=".10"
os.environ["XLA_PYTHON_CLIENT_ALLOCATOR"]="platform"
But this has not effect on the behavior. For a reproducible example, try running the classification model in the latent GP example notebook:
with pm.Model() as model:
ell = pm.InverseGamma("ell", mu=1.0, sigma=0.5)
eta = pm.Exponential("eta", lam=1.0)
cov = eta**2 * pm.gp.cov.ExpQuad(1, ell)
gp = pm.gp.Latent(cov_func=cov)
f = gp.prior("f", X=x[:, None])
# logit link and Bernoulli likelihood
p = pm.Deterministic("p", pm.math.invlogit(f))
y_ = pm.Bernoulli("y", p=p, observed=y)
# idata = pm.sample(1000, chains=2, cores=2, nuts_sampler="numpyro")
idata = pmx.fit(method='pathfinder')
I've yet to run pathfinder successfully on anything but toy models; given that VI is primarily an approximation for large models not able to be sampled quickly, this limits the use of pathfinder.