Fast-dLLM v2 has below Generation Process to speed up:
- Block-level Generation: Autoregressive at the block level
- Sub-block Parallelization: Parallel decoding within blocks for efficiency
- Hierarchical Caching: Block and sub-block level caching for speed optimization
whether already support it? thx!