Skip to content

Commit 6e2de8a

Browse files
committed
cmdstanpy func spec updated per Bob comments
1 parent 13fcdc3 commit 6e2de8a

File tree

1 file changed

+135
-147
lines changed

1 file changed

+135
-147
lines changed

designs/0002-cmdstanpy_func_spec.md

Lines changed: 135 additions & 147 deletions
Original file line numberDiff line numberDiff line change
@@ -41,10 +41,140 @@ for CmdStanPy sampler function.
4141

4242
- Other packages will be used to analyze the posterior sample.
4343

44+
## Workflow
45+
46+
* Specify Stan model - function `compile_model`
47+
48+
* Assemble input data
49+
+ as Python `dict`, use `StanData` object methods to serialize to file for CmdStan
50+
+ using existing data file - use directly
51+
52+
* Run sampler - function `sample` produces `RunSet` object
53+
54+
* Check posterior - functions `stansummary`, `diagnose`
55+
56+
* Create `PosteriorSample` object for downstream analysis.
57+
58+
4459
## CmdStanPy API
4560

4661
The CmdStanPy interface is implemented as a Python package
47-
with the following classes and functions.
62+
with the following functions and classes.
63+
64+
## Functions
65+
66+
### compile_file
67+
68+
Compile Stan model, returning immutable instance of a compiled model.
69+
This is done in two steps:
70+
71+
* call the `stanc` compiler which translates the Stan program to c++
72+
* call c++ to compile and link the generated c++ code
73+
74+
The `compile_file` function must allow the user to specify
75+
default settings to the c++ compiler and ways to override those setting.
76+
77+
```
78+
model = compile_file(path = None,
79+
optimization_flag = 3,
80+
...)
81+
```
82+
83+
In case of compilation failure, this function returns `None`
84+
and the `compile_file` function reports the compiler error messages.
85+
86+
87+
#### parameters
88+
89+
* `path` = - string, must be valid pathname to Stan program file
90+
* `optimization_flag` = optimization level, the value of the `-o` flag for the c++ compiler, default value is `3`
91+
* additional flags for the c++ compiler
92+
93+
94+
### sample (using HMC/NUTS)
95+
96+
Condition the model on the data using HMC/NUTS with diagonal metric: `stan::services::sample::hmc_nuts_diag_e_adapt`
97+
and run one or more chains, producing a set of samples from the posterior.
98+
Returns a `RunSet` object which contains information on all runs for all chains.
99+
100+
```
101+
RunSet = sample(model,
102+
chains = 4,
103+
cores = 1,
104+
seed = None,
105+
data_file = None,
106+
init_param_values = None,
107+
csv_output_file = None,
108+
console_output_file = None,
109+
refresh = None,
110+
post_warmup_draws_per_chain = None,
111+
warmup_draws_per_chain = None,
112+
save_warmup = False,
113+
thin = None,
114+
do_adaptation = True,
115+
adapt_gamma = None,
116+
adapt_delta = None,
117+
adapt_kappa = None,
118+
adapt_t0 = None,
119+
nuts_max_depth = None,
120+
hmc_metric_file = None,
121+
hmc_stepsize = None)
122+
```
123+
The `model` and `output_file` parameter are required, all other parameters are optional.
124+
125+
The `sample` command runs one or more sampler chains (argument `num_chains`), in parallel or sequentially.
126+
The `num_cores` argument specifies the maximum number of processes which can be run in parallel.
127+
128+
#### parameters
129+
130+
* `model` - required - CmdStanPy model object
131+
* `num_chains` - positive integer, default 4
132+
* `num_cores` - positive integer, default 1
133+
* `seed` - integer - random seed
134+
* `data_file` - string - full pathname of input data file in JSON or Rdump format
135+
* `init_param_values` - string - full pathname of file of initial values for some or all parameters in JSON or Rdump format
136+
* `csv_output_file` - string - full pathname of the sampler output file, in stan-csv format, , each chain's output is written to its own file '<csv-output>-<chain_id>.csv'
137+
* `console_output_file` - string - full pathname of file of sampler messages to stdout and/or stderr, each chain's output is written to its own file '<console-output>-<chain_id>.txt'
138+
* `refresh` - integer - the number of iterations between progress message updates. When `refresh = -1`, the progress message is suppressed but not warning messages.
139+
* `post_warmup_draws_per_chain` non-negative integer - number of post-warmup draws for each chain
140+
* `warmup_draws_per_chain` non-negative integer - number of warmup draws for each chain
141+
* `save_warmup` - boolean, default False - whether or not warmup draws are written to output file
142+
* `thin` - non-negative integer - period between saved draws
143+
* `nuts_max_treedepth` - integer - NUTS maximum tree depth
144+
* `do_adaptation` - boolean, default True - whether or not NUTS algorithm updates sampler stepsize and metric during warmup, True implies num warmup draws > 0
145+
* `adapt_gamma` - non-negative double - adaptation regularization scale,
146+
* `adapt_delta` - non-negative double - adaptation target acceptance statistic
147+
* `adapt_kappa` - non-negative double - adaptation relaxation exponent
148+
* `adapt_t0` non-negative integer - adaptation iteration offset
149+
* `hmc_metric_file` - string - full pathname of file containing precomputed diagonal Euclidian metric in JSON or Rdump format
150+
* `hmc_stepsize` - positive double value - step size for discrete evolution
151+
152+
These arguments must be translated into a valid call to the CmdStan sampler.
153+
This requires assembling the arguments into a specific order and adding additional
154+
CmdStan arguments.
155+
156+
### summary
157+
158+
Calls CmdStan's `summary` executable passing in the names of the per-chain output files
159+
stored in the `RunSet` object.
160+
Prints output to console or file
161+
162+
```
163+
summary(runset = `sampler_runset`, output_file= "filename")
164+
```
165+
166+
### diagnose
167+
168+
Calls CmdStan's `diagnose` executable passing in the names of the per-chain output files
169+
stored in the `RunSet` object.
170+
If there are no diagnostic messages, prints message that no problems were found.
171+
172+
Prints output to console or file
173+
174+
```
175+
diagnose(runset = `sampler_runset`, output_file= "filename")
176+
```
177+
48178

49179
## Classes
50180

@@ -107,15 +237,16 @@ Each run is one _chain_ and the set of draws for that chain is one _sample_.
107237

108238
The `RunSet` object records all information about the set of runs:
109239

110-
- CmdStan arguments
111240
- number of chains
241+
- per-chain call to CmdStan
112242
- per-chain output file name
243+
- per-chain transcript of output to stdout and stderr
244+
- per-chain return code
113245

114246

115247
### PosteriorSample
116248

117-
The `PosteriorSample` object combines all outputs from a `RunSet`
118-
into a single object.
249+
The `PosteriorSample` object combines all outputs from a `RunSet` into a single object.
119250
The numpy module is used to manage this information in a memory-efficient fashion.
120251

121252
The `PosteriorSample` object
@@ -166,146 +297,3 @@ This requires transposing the information in the CmdStan csv output files where
166297
each file corresponds to the chain, each row of output corresponds to the iteration,
167298
and each column corresponds to a particular label.
168299

169-
170-
## Functions
171-
172-
### compile_file
173-
174-
Compile Stan model, returning immutable instance of a compiled model.
175-
This is done in two steps:
176-
177-
* call the `stanc` compiler which translates the Stan program to c++
178-
* call c++ to compile and link the generated c++ code
179-
180-
The `compile_file` function must allow the user to specify
181-
default settings to the c++ compiler and ways to override those setting.
182-
183-
```
184-
model = compile_file(path = None,
185-
opt_level = 3,
186-
...)
187-
```
188-
189-
In case of compilation failure, this function returns `None`
190-
and the `compile_file` function reports the compiler error messages.
191-
192-
193-
#### parameters
194-
195-
* `path` = - string, must be valid pathname to Stan program file
196-
* `opt_level` = optimization level, the value of the `-o` flag for the c++ compiler, default value is `3`
197-
* additional flags for the c++ compiler
198-
199-
200-
### sample (using HMC/NUTS)
201-
202-
Condition the model on the data using HMC/NUTS with diagonal metric: `stan::services::sample::hmc_nuts_diag_e_adapt`
203-
to produce a posterior sample.
204-
205-
206-
```
207-
RunSet = sample(model = None,
208-
num_chains = 4,
209-
num_cores = 1,
210-
seed = None,
211-
data_file = "",
212-
init_param_values = "",
213-
output_file = "",
214-
diagnostic_file = "",
215-
refresh = 100,
216-
num_samples = 1000,
217-
num_warmup = 1000,
218-
save_warmup = False,
219-
thin_samples = 1,
220-
adapt_engaged = True,
221-
adapt_gamma = 0.05,
222-
adapt_delta = 0.65,
223-
adapt_kappa = 0.75,
224-
adapt_t0 = 10,
225-
nuts_max_depth = 10,
226-
hmc_diag_metric = "",
227-
hmc_stepsize = 1,
228-
hmc_stepsize_jitter = 0)
229-
```
230-
231-
The `sample` command can run chains in parallel or sequentially.
232-
The `num_cores` argument specifies the maximum number of processes which
233-
can be run in parallel.
234-
235-
If any of the runs fail for any reason, this function returns `None`
236-
and reports all error messages.
237-
238-
239-
#### CmdStanPy specific parameters
240-
241-
* `model` - CmdStanPy model object
242-
* `num_chains` - positive integer
243-
* `num_cores` - positive integer
244-
245-
#### CmdStan parameters
246-
247-
The named arguments must be translated into a valid call to the CmdStan sampler.
248-
This requires assembling the arguments into a specific order and adding additional
249-
CmdStan arguments.
250-
251-
* Random seed - CmdStan arg must be preceded by `random`
252-
+ `seed` - random seed
253-
254-
* Data Inputs - CmdStan args preceded by `data`
255-
+ `data_file` - string,
256-
+ `init_param_values` - string, default is empty string, must be valid pathname
257-
to file with read permissions in Rdump or JSON format which specifies initial values for some or all parameters.
258-
259-
* Outputs
260-
+ `output_file` - string value, default is empty string, must be valid pathname
261-
+ `diagnostic_file` - string value, default is empty string, must be valid pathname
262-
+ `refresh` - integer, the number of iterations between progress message updates.
263-
When `refresh = -1`, the progress message is suppressed but not warning messages.
264-
265-
* MCMC Sampling - CmdStan args must be preceded by `sample`
266-
+ `num_samples` Number of sampling iterations - non-negative integer, default 1000
267-
+ `num_warmup` Number of warmup iterations - non-negative integer, default 1000
268-
+ `save_warmup` Stream warmup samples to output? - True (1) False (0), default False
269-
+ `thin_samples` Period between saved samples - non-negative integer, default 1
270-
271-
* Warmup Adaptation controls: CmdStan args must be preceded by `adapt`
272-
+ `adapt_engaged` True (1) False (0), default True
273-
+ `adapt_gamma` Adaptation regularization scale, double > 0, default 0.05
274-
+ `adapt_delta` Adaptation target acceptance statistic, double > 0, default 0.65
275-
+ `adapt_kappa` Adaptation relaxation exponent, double > 0, default 0.75
276-
+ `adapt_t0` Adaptation iteration offset, double > 0, default 10
277-
278-
* HMC Sampler: CmdStan arg must be preceded by `algorithm=hmc engine=nuts`
279-
+ `NUTS_max_depth` - Maximum tree depth, int > 0, default 10
280-
281-
* HMC Metric: must be preceded by keywords `metric=diag`
282-
+ `HMC_diag_metric` - string value, default is empty string, must be valid pathname
283-
to file with read permissions in Rdump or JSON format which specifies precomputed Euclidian metric.
284-
+ `HMC_stepsize` - positive double value, step size for discrete evolution, double > 0, default 1
285-
+ `HMC_stepsize_jitter` Uniformly random jitter of the stepsize, values between 0,1, default 0
286-
287-
_note: CmdStan uses uppercase `NUTS` and `HMC` in argument names, but lowercase `algorithm=hmc engine=nuts`_
288-
289-
### summary
290-
291-
Calls CmdStan's `summary` executable passing in the names of the per-chain output files
292-
stored in the `RunSet` object.
293-
Prints output to console or file
294-
295-
```
296-
summary(runset = `sampler_runset`, output_file= "filename")
297-
```
298-
299-
300-
### diagnose
301-
302-
Calls CmdStan's `diagnose` executable passing in the names of the per-chain output files
303-
stored in the `RunSet` object.
304-
If there are no diagnostic messages, prints message that no problems were found.
305-
306-
Prints output to console or file
307-
308-
```
309-
diagnose(runset = `sampler_runset`, output_file= "filename")
310-
```
311-

0 commit comments

Comments
 (0)