v0.12.0 New doc, gather_for_metrics, balanced device map and M1 support
New documentation
The whole documentation has been revamped, just go look at it here!
- Complete revamp of the docs by @muellerzr in #495
New gather_for_metrics method
When doing distributed evaluation, the dataloader loops back at the beginning of the dataset to make batches that have a round multiple of the number of processes. This causes the predictions to be slightly bigger than the length of the dataset, which used to require some truncating. This is all done behind the scenes now if you replace the gather
your did in evaluation by gather_for_metrics
.
- Reenable Gather for Metrics by @muellerzr in #590
- Fix gather_for_metrics by @muellerzr in #578
- Add a gather_for_metrics capability by @muellerzr in #540
Balanced device maps
When loading big models for inference, device_map="auto"
used to fill the GPUs sequentially, making it hard to use a batch size > 1. It now balances the weights evenly on the GPUs so if you have more GPU space than the model size, you can do predictions with a bigger batch size!
M1 GPU support
Accelerate now supports M1 GPUs, to learn more about how to setup your environment, see the documentation.
- M1 GPU
mps
device integration by @pacman100 in #596
What's new?
- Small fixed for balanced device maps by @sgugger in #583
- Add balanced option for auto device map creation by @sgugger in #534
- fixing deepspeed slow tests issue by @pacman100 in #604
- add more conditions on casting by @younesbelkada in #606
- Remove redundant
.run
inWandBTracker
. by @zh-plus in #605 - Fix some typos + wordings by @muellerzr in #603
- reorg of test scripts and minor changes to tests by @pacman100 in #602
- Move warning by @muellerzr in #598
- Shorthand way to grab a tracker by @muellerzr in #594
- Pin deepspeed by @muellerzr in #595
- Improve docstring by @muellerzr in #591
- TESTS! by @muellerzr in #589
- Fix DispatchDataloader by @sgugger in #588
- Use main_process_first in the examples by @muellerzr in #581
- Skip and raise NotImplementedError for gather_for_metrics for now by @muellerzr in #580
- minor FSDP launcher fix by @pacman100 in #579
- Refine test in set_module_tensor_to_device by @sgugger in #577
- Fix
set_module_tensor_to_device
by @sgugger in #576 - Add 8 bit support - chapter II by @younesbelkada in #539
- Fix tests, add wandb to gitignore by @muellerzr in #573
- Fix step by @muellerzr in #572
- Speed up main CI by @muellerzr in #571
- ccl version check and import different module according to version by @sywangyi in #567
- set default num_cpu_threads_per_process to improve oob performance by @sywangyi in #562
- Add a tqdm helper by @muellerzr in #564
- Rename actions to be a bit more accurate by @muellerzr in #568
- Fix clean by @muellerzr in #569
- enhancements and fixes for FSDP and DeepSpeed by @pacman100 in #532
- fix: saving model weights by @csarron in #556
- add on_main_process decorators by @ZhiyuanChen in #488
- Update imports.py by @KimBioInfoStudio in #554
- unpin
datasets
by @lhoestq in #563 - Create good defaults in
accelerate launch
by @muellerzr in #553 - Fix a few minor issues with example code in docs by @BenjaminBossan in #551
- deepspeed version
0.6.7
fix by @pacman100 in #544 - Rename test extras to testing by @muellerzr in #545
- Add production testing + fix failing CI by @muellerzr in #547
- Add a gather_for_metrics capability by @muellerzr in #540
- Allow for kwargs to be passed to trackers by @muellerzr in #542
- Add support for downcasting bf16 on TPUs by @muellerzr in #523
- Add more documentation for device maps computations by @sgugger in #530
- Restyle prepare one by @muellerzr in #531
- Pick a better default for offload_state_dict by @sgugger in #529
- fix some parameter setting does not work for CPU DDP and bf16 fail in… by @sywangyi in #527
- Fix accelerate tests command by @sgugger in #528
Significant community contributions
The following contributors have made significant changes to the library over the last release: