Skip to content

[Docs]: Add Release Documentation for Version 1.20.0 #501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

abukhoy
Copy link
Contributor

@abukhoy abukhoy commented Jul 8, 2025

This PR introduces a new release documentation page (release_docs.md) for the Efficient Transformer library. The document outlines all the key updates included in Release 1.20.0, including:

✅ Newly onboarded models (e.g., Llama-4-Scout, Grok-1, Gemma3, Granite Vision/MOE)
✨ New features and enhancements (e.g., io_encrypt support, flexible pooling, on-device sampling)
🛠️ Fine-tuning support and improvements
🔮 Upcoming models and planned features for future releases
This documentation aims to provide a comprehensive overview of the current release for internal teams and external users.

Credit: @quic-amitraj

abukhoy added 22 commits May 21, 2025 08:32
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
@abukhoy
Copy link
Contributor Author

abukhoy commented Jul 8, 2025

abukhoy added 2 commits July 8, 2025 09:53
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>

#### Single QPC

In the **Single QPC** setup, the entire model—including both image encoding and text generation—runs within a **single quantized configuration**. There is no model splitting, and all components operate within the same execution environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single quantized configuration? please rephrase this.

- **SpD & Multi-Projection Heads**: Token speculation via post-attention projections
- **I/O Encryption**: `--io-encrypt` flag support in compile/infer APIs
- **Separate Prefill/Decode Compilation**: For disaggregated serving
- **On-Device Sampling**: Reduces host-device latency for CausalLM models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its supported using vllm. Native Qeff doesnt support ondevice sampling please update the point accordingly

- Gradient checkpointing, device-aware `GradScaler`, and CLI `--help` added

---
Thank you for using Efficient Transformers! For more details, refer to the full documentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove it.


**Single QPC:**
In the **Single QPC** setup, the entire model—including both image encoding and text generation—runs within a **single Qualcomm Program Container**. There is no model splitting, and all components operate within the same execution environment.
- The single QPC approach introduces the flexibility to run the vision and language components independently.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not in single QPC we provide the flexibility to run the vision and language component independently. Its in the dual QPC approach.
in line 79 add single QPC (Qualcomm Program Container) instead of single Qualcomm Program Container

abukhoy added 2 commits July 14, 2025 08:50
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants