-
Notifications
You must be signed in to change notification settings - Fork 382
Controller-based Privacy Engine for Better Transformer Compatibility #794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Controller-based Privacy Engine for Better Transformer Compatibility #794
Conversation
|
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this in D85086663. (Because this pull request was imported automatically, there will not be any future comments.) |
|
@evgri243 thank you for this heavy-lifting change. I will take some time to digest and also discuss internally with the team. In the meanwhile I have some questions for you:
As an intermediate step, we could place your approach in the "research" folder, which we do not actively maintain, but are happy to support you in maintaining it. This would allow some time for the method to be digested before moving it into the main opacus folder. |
|
Thanks for consideration. It is still work in progress, but I'd love to know your opinion. Let me answer your questions in words, then I'll come with examples if needed:
I thought about "research" or "contrib". My major problem is to make it package able, but I guess it is not a major implementation issue to add yet another package. |
|
Thank you for these explanations. I like your solution and have also experienced some annoyances with accessing attributes of the model post-wrapping. I also understand the use case better now. I believe we can minimize code duplication which would make it more reasonable to introduce this into Opacus.
Do these make sense? Regarding ghost clipping and FSDP. You mention that you mostly use LoRA. Ghost clipping does not give any memory advantage with LoRA fine-tunining since the effective linear layer width is small, so just wanted to give a heads up that ghost clipping might not be needed for your use case. We did not implement FSDP with vanilla (non-ghost) clipping since this required more significant effort, though we did put some work into this and if you're interested in using extending FSDP + vanilla, then we'd welcome PRs here. |
// I'm not sure how to make changes of that scale or whether they are desired, but I'd like to highlight them for discussion. Probably, we should keep it out for a while until the implementation is feature complete and well-tested; meanwhile we can use the PR to discuss the changes.//
Summary
This PR introduces
PrivacyEngineGradSampleController, an alternative implementation of Opacus'sPrivacyEnginethat attaches hooks directly to models without wrapping them inGradSampleModule. This solves compatibility issues with transformers and other models that have complex attribute access patterns.Motivation
The current
PrivacyEnginewraps models in aGradSampleModule, which creates several issues:isinstance(model, BertModel)returnsFalseafter wrapping_module.prefixes in state dicts__getattr__behavior in transformers can breakThese issues are particularly problematic with HuggingFace transformers and other libraries that perform introspection on model objects.
Solution
Instead of wrapping the model, we:
register_forward_hook()andregister_full_backward_hook()GradSampleControllerclasssetattr()(e.g.,param.grad_sample)The model remains unchanged - no wrapper, no indirection, no type issues.
Implementation
Files Added
opacus/grad_sample_controller.py(~480 lines)GradSampleControllerclass that manages hook lifecycleopacus/privacy_engine_gsc.py(~530 lines)PrivacyEngineGradSampleControllerclass with same API asPrivacyEngineGradSampleControllerinstead of wrapping modelopacus/tests/privacy_engine_gsc_test.py(~260 lines)Key Differences from Current Approach
GradSampleModule)_module.prefixUsage
Validation
The code was used to do Zetta 7B transformer LoRA alignment using DPOTrainer from TRL library.
Correctness
GradSampleModule.add_hooks()exactlycreate_or_accumulate_grad_sample()andpromote_current_grad_sample()functionsparam.grad_sampleattribute, which we provideKey Implementation Details
GradSampleModule.GRAD_SAMPLERS(registered via decorators)GradSampleModuleAPI Compatibility
The new class maintains full API compatibility with
PrivacyEngine:make_private()signaturemake_private_with_epsilon()signaturesave_checkpoint()andload_checkpoint()methodsget_epsilon()methodMigration is trivial: Just change import statement.
Testing
Comprehensive test suite covers:
_module.prefix)Benefits
__getattr__issues_module.prefix to handleisinstance(model, MyModel)returnsTrueTrade-offs
privacy_engine.cleanup()to remove hookscleanup())Backward Compatibility
PrivacyEngineDPOptimizerclassesFuture Work
Checklist