Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization to Model Script #499

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

EvanCWallace
Copy link

NOT FULLY TESTED (Compiles but would like the review)

  • Appended Mixed Precision Training (FP16/BF16)
  • Generated Low-Rank Factorization (SVD) Functionality Generated Attention Efficiency using Linformer
  • Reducing Memory & Computational Complexity using FlashAttention Attached Functionality for Spare Matrices using Butterfly Matrices (Structured Linear Layers) Generated Function for Low-Rank Approximations

Changes to the Transformer Class:

  • Efficient Initialization
  • Uses list comprehension for self.layers instead of a loop. Consolidated distributed initialization logic.
  • Memory and Performance Enhancements
  • Avoids unnecessary operations on tensors.
  • Uses .shape instead of .size() for clarity.
  • Code Clarity and Maintainability
  • Removed redundant variables.
  • Used in-place operations where applicable.

Changes to the Gate Class:

  • Replaced linear(x, self.weight) with torch.matmul(x, self.weight.T): More efficient for linear transformations.
  • Reduced Redundant Computations:
  • Avoided unnecessary reassignments.
  • Merged bias addition into a single step.
  • Optimized Group-Based Routing:
  • Used amax instead of unnecessary top-k and sum operations. Applied in-place scatter operation for memory efficiency. Simplified Expert Selection:
  • Directly applied topk for selecting top experts.

Changes to the MLA Class:

  • Removed Redundant Computations:
    • Consolidated tensor operations into efficient sequences.
    • Used torch.einsum to optimize matrix multiplications.
  • Reduced Repetitive if Conditions:
    • Moved conditional logic outside loops where applicable.
  • Refactored Caching Logic:
    • Used in-place assignments for cache updates.
    • Minimized unnecessary tensor copies.
  • Improved Readability:
    • Clearer separation of query, key, and value computations.
    • Concise variable naming and inline comments.

Appended Mixed Precision Training (FP16/BF16)
Generated Low-Rank Factorization (SVD) Functionality
Generated Attention Efficiency using Linformer
Reducing Memory & Computational Complexity using FlashAttention
Attached Functionality for Spare Matrices using Butterfly Matrices (Structured Linear Layers)
Generated Function for Low-Rank Approximations

Changes to the Transformer Class:
Efficient Initialization
Uses list comprehension for self.layers instead of a loop.
Consolidated distributed initialization logic.
Memory and Performance Enhancements
Avoids unnecessary operations on tensors.
Uses .shape instead of .size() for clarity.
Code Clarity and Maintainability
Removed redundant variables.
Used in-place operations where applicable.

Changes to the Gate Class:
Replaced linear(x, self.weight) with torch.matmul(x, self.weight.T):
More efficient for linear transformations.
Reduced Redundant Computations:
Avoided unnecessary reassignments.
Merged bias addition into a single step.
Optimized Group-Based Routing:
Used amax instead of unnecessary top-k and sum operations.
Applied in-place scatter operation for memory efficiency.
Simplified Expert Selection:
Directly applied topk for selecting top experts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant