Pathfinder fails on GPU for moderate-sized models

The Pathfinder implementation fails for any model of moderate size when run on a GPU, quickly running out of GPU memory:

```
XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 4464000000 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
             parameter allocation:    4.16GiB
              constant allocation:         0B
        maybe_live_out allocation:    4.16GiB
     preallocated temp allocation:         0B
                 total allocation:    8.31GiB
              total fragmentation:         0B (0.00%)
Peak buffers:
	Buffer 1:
		Size: 4.16GiB
		Entry Parameter Subshape: f64[31,200,300,300]
		==========================

	Buffer 2:
		Size: 4.16GiB
		Operator: op_name="jit(fn)/jit(main)/mul" source_file="/usr/local/lib/python3.10/dist-packages/pytensor/link/jax/dispatch/scalar.py" source_line=103
		XLA Label: fusion
		Shape: f64[31,200,300,300]
		==========================

	Buffer 3:
		Size: 8B
		Entry Parameter Subshape: s64[]
		==========================

```

I've tried setting some appropriate XLA environment variables as follows:

```
os.environ["XLA_PYTHON_CLIENT_PREALLOCATE"]="false"
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]=".10"
os.environ["XLA_PYTHON_CLIENT_ALLOCATOR"]="platform"
```

But this has not effect on the behavior. For a reproducible example, try running the classification model in the latent GP example notebook:

```
with pm.Model() as model:
    ell = pm.InverseGamma("ell", mu=1.0, sigma=0.5)
    eta = pm.Exponential("eta", lam=1.0)
    cov = eta**2 * pm.gp.cov.ExpQuad(1, ell)

    gp = pm.gp.Latent(cov_func=cov)
    f = gp.prior("f", X=x[:, None])

    # logit link and Bernoulli likelihood
    p = pm.Deterministic("p", pm.math.invlogit(f))
    y_ = pm.Bernoulli("y", p=p, observed=y)

    # idata = pm.sample(1000, chains=2, cores=2, nuts_sampler="numpyro")
    idata = pmx.fit(method='pathfinder')
```

I've yet to run pathfinder successfully on anything but toy models; given that VI is primarily an approximation for large models not able to be sampled quickly, this limits the use of pathfinder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pathfinder fails on GPU for moderate-sized models #188

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Pathfinder fails on GPU for moderate-sized models #188

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions