🚀 Dataloading for AMPC Training in Jax 🚀

This repo contains example implementations for dataloading of "small" datasets of dynamical systems in Jax. We compare the performance of pure CPU dataloading, CPU dataloading with device prefetch (prefetching a few batches to GPU in advance), CPU dataloading with device prefetch and host prefetch (additionally prefetching multiple batches on the host using a Python thread pool), and pure GPU dataloading (dataset is transferred to GPU once and kept on GPU). The comparison is tailored to dynamical system data (datasets containing states X, action sequences U, and references Y).

TL;DR: ⏱️ In terms of "GPU wallclock hours" (the unit of billing on many HPC and cloud computing providers), it is most effective to load the complete dataset to GPU for small models.

💻 Running the evaluation 💻

To sweep across different batch, model, and dataset sizes, run:

python main.py

📊 Result: 📊


Figure 1: Performance evaluations of dataloaders on NVIDIA A100 GPU with sweeps over batch size, model size, and dataset size. For comparison, the time for putting the complete dataset on the device is also plotted (only done once when initializing the GPU dataloader).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md
dataloader.py		dataloader.py
dataset.py		dataset.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Dataloading for AMPC Training in Jax 🚀

💻 Running the evaluation 💻

About

Uh oh!

Languages

hshose/jax-ampc-dataloader

Folders and files

Latest commit

History

Repository files navigation

🚀 Dataloading for AMPC Training in Jax 🚀

💻 Running the evaluation 💻

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages