-
Notifications
You must be signed in to change notification settings - Fork 45
[Docs]: Add Release Documentation for Version 1.20.0 #501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
@quic-amitraj Check the latest github page: https://abukhoy.github.io/efficient-transformers/index.html |
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
docs/source/validate.md
Outdated
|
||
#### Single QPC | ||
|
||
In the **Single QPC** setup, the entire model—including both image encoding and text generation—runs within a **single quantized configuration**. There is no model splitting, and all components operate within the same execution environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single quantized configuration? please rephrase this.
docs/source/release_docs.md
Outdated
- **SpD & Multi-Projection Heads**: Token speculation via post-attention projections | ||
- **I/O Encryption**: `--io-encrypt` flag support in compile/infer APIs | ||
- **Separate Prefill/Decode Compilation**: For disaggregated serving | ||
- **On-Device Sampling**: Reduces host-device latency for CausalLM models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its supported using vllm. Native Qeff doesnt support ondevice sampling please update the point accordingly
docs/source/release_docs.md
Outdated
- Gradient checkpointing, device-aware `GradScaler`, and CLI `--help` added | ||
|
||
--- | ||
Thank you for using Efficient Transformers! For more details, refer to the full documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove it.
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
docs/source/validate.md
Outdated
|
||
**Single QPC:** | ||
In the **Single QPC** setup, the entire model—including both image encoding and text generation—runs within a **single Qualcomm Program Container**. There is no model splitting, and all components operate within the same execution environment. | ||
- The single QPC approach introduces the flexibility to run the vision and language components independently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not in single QPC we provide the flexibility to run the vision and language component independently. Its in the dual QPC approach.
in line 79 add single QPC (Qualcomm Program Container) instead of single Qualcomm Program Container
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
This PR introduces a new release documentation page (release_docs.md) for the Efficient Transformer library. The document outlines all the key updates included in Release 1.20.0, including:
✅ Newly onboarded models (e.g., Llama-4-Scout, Grok-1, Gemma3, Granite Vision/MOE)
✨ New features and enhancements (e.g., io_encrypt support, flexible pooling, on-device sampling)
🛠️ Fine-tuning support and improvements
🔮 Upcoming models and planned features for future releases
This documentation aims to provide a comprehensive overview of the current release for internal teams and external users.
Credit: @quic-amitraj