faster bootstrap iterations for large data? #168

markdanese · 2023-08-10T19:09:49Z

Is there an option or a way to do a multi-threaded version of the bootstrap? I am using standsurv and the data is pretty big (280,000 people). It takes about a minute and a half to run each bootstrap iteration, so running a few hundred is pretty expensive. The delta method was over an hour and I stopped it. I am happy to run multiple threads myself and pool the iterations but it seems that the only thing returned are the summary statistics (i.e., 95% CI) and not the actual results for each iteration. Is there a creative way to work around this that I am not seeing?

chjackson · 2023-08-13T15:03:51Z

flexsurv::bootci.fmsm can do multicore bootstrapping for any user-defined output from a flexsurv model, but there is no multicore feature in standsurv (copied to @mikesweeting as the author of this function).

mikesweeting · 2023-08-14T13:55:05Z

Hi both. standsurv follows the way bootstrapping is done in summary.flexsurvreg which uses the normbootfn.flexsurvreg function. This doesn't seem to have the same multicore functionality that bootci.fmsm does.

@chjackson; could you perhaps explain the difference between these two functions? If you think we should switch to using bootci.fmsm instead I could look into implementing this.

More generally I've been considering whether standsurv should use bootstrapping as the default (rather than the delta method) as this would match summary.flexsurvreg. Thoughts?

chjackson · 2023-08-14T14:43:11Z

This is a bit messy unfortunately! These functions have grown organically rather than being meticulously designed.

normbootfn.flexsurvreg is not exposed to users, and is limited to bootstrapping the functions included in summary.fns. Hence it assumes the function handles the t and start arguments, which a user-supplied function might not.

bootci.fmsm was designed initially to handle multistate models (hence the name), but as a consequence it handles simple survival models too. It can deal with any function of the parameters, and it's user-visible. So it'd be more generalisable to use this instead. The only thing I can see it's missing is the rawsim feature added for doing causal contrasts in standsurv.

I have an intuitive preference for the parametric bootstrap over the delta method, but that is not based on any systematic comparison. There are some mixed results comparing these approaches in this paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster bootstrap iterations for large data? #168

faster bootstrap iterations for large data? #168

markdanese commented Aug 10, 2023 •

edited

Loading

chjackson commented Aug 13, 2023

mikesweeting commented Aug 14, 2023

chjackson commented Aug 14, 2023

faster bootstrap iterations for large data? #168

faster bootstrap iterations for large data? #168

Comments

markdanese commented Aug 10, 2023 • edited Loading

chjackson commented Aug 13, 2023

mikesweeting commented Aug 14, 2023

chjackson commented Aug 14, 2023

markdanese commented Aug 10, 2023 •

edited

Loading