Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster bootstrap iterations for large data? #168

Open
markdanese opened this issue Aug 10, 2023 · 3 comments
Open

faster bootstrap iterations for large data? #168

markdanese opened this issue Aug 10, 2023 · 3 comments

Comments

@markdanese
Copy link

markdanese commented Aug 10, 2023

Is there an option or a way to do a multi-threaded version of the bootstrap? I am using standsurv and the data is pretty big (280,000 people). It takes about a minute and a half to run each bootstrap iteration, so running a few hundred is pretty expensive. The delta method was over an hour and I stopped it. I am happy to run multiple threads myself and pool the iterations but it seems that the only thing returned are the summary statistics (i.e., 95% CI) and not the actual results for each iteration. Is there a creative way to work around this that I am not seeing?

@chjackson
Copy link
Owner

flexsurv::bootci.fmsm can do multicore bootstrapping for any user-defined output from a flexsurv model, but there is no multicore feature in standsurv (copied to @mikesweeting as the author of this function).

@mikesweeting
Copy link
Contributor

Hi both. standsurv follows the way bootstrapping is done in summary.flexsurvreg which uses the normbootfn.flexsurvreg function. This doesn't seem to have the same multicore functionality that bootci.fmsm does.

@chjackson; could you perhaps explain the difference between these two functions? If you think we should switch to using bootci.fmsm instead I could look into implementing this.

More generally I've been considering whether standsurv should use bootstrapping as the default (rather than the delta method) as this would match summary.flexsurvreg. Thoughts?

@chjackson
Copy link
Owner

This is a bit messy unfortunately! These functions have grown organically rather than being meticulously designed.

normbootfn.flexsurvreg is not exposed to users, and is limited to bootstrapping the functions included in summary.fns. Hence it assumes the function handles the t and start arguments, which a user-supplied function might not.

bootci.fmsm was designed initially to handle multistate models (hence the name), but as a consequence it handles simple survival models too. It can deal with any function of the parameters, and it's user-visible. So it'd be more generalisable to use this instead. The only thing I can see it's missing is the rawsim feature added for doing causal contrasts in standsurv.

I have an intuitive preference for the parametric bootstrap over the delta method, but that is not based on any systematic comparison. There are some mixed results comparing these approaches in this paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants