feat: add OOM pre-check for vision models and fix InternVL image dime… by sufubao · Pull Request #1253 · ModelTC/LightLLM

sufubao · 2026-04-02T10:57:42Z

…nsion handling

gemini-code-assist

Code Review

This pull request introduces an OOM (Out of Memory) pre-check mechanism for Qwen-family vision models by performing a dummy forward pass with worst-case image dimensions during initialization. It updates several model implementations to support this check and refactors the ViT model to derive inference parameters from the configuration instead of environment variables. Feedback focuses on improving memory management within the pre-check function by explicitly deleting tensors and clearing the CUDA cache, as well as refining exception handling and logging practices.

gemini-code-assist · 2026-04-02T10:59:19Z

lightllm/models/qwen2_vl/vision_process.py

+        grid_thw = grid_thw.to("cuda", non_blocking=True)
+
+        result = model.forward(pixel_values, grid_thw=grid_thw)
+        del result


After the forward pass, the pixel_values and grid_thw tensors remain in GPU memory until the function returns. In an OOM pre-check context, it is best practice to explicitly delete these large tensors and call torch.cuda.empty_cache() to ensure that the memory is immediately available for the subsequent model initialization and KV cache allocation.

Suggested change

del result

del result, pixel_values, grid_thw

torch.cuda.empty_cache()

gemini-code-assist · 2026-04-02T10:59:20Z

lightllm/models/qwen2_vl/vision_process.py

+        del result
+        logger.info(f"vit check max_len {max_batch_size} infer ok")
+    except (RuntimeError, torch.OutOfMemoryError, ValueError) as e:
+        logger.exception(str(e))


Using logger.exception(str(e)) is redundant because logger.exception automatically captures the exception object and its traceback. It is better to provide a descriptive message about what failed.

Suggested change

logger.exception(str(e))

logger.exception("Qwen VL check max len infer failed")

gemini-code-assist · 2026-04-02T10:59:20Z

lightllm/models/qwen2_vl/vision_process.py

+            "1.Set the --visual_infer_batch_size to a smaller value."
+        )
+        logger.error(exception_str)
+        raise Exception(exception_str)


Raising a base Exception is generally discouraged as it makes it harder for calling code to distinguish between different types of failures. Using RuntimeError is more appropriate for an execution failure during a model check.

Suggested change

raise Exception(exception_str)

raise RuntimeError(exception_str)

feat: add OOM pre-check for vision models and fix InternVL image dime…

4af0cbb

…nsion handling

gemini-code-assist bot reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add OOM pre-check for vision models and fix InternVL image dime…#1253

feat: add OOM pre-check for vision models and fix InternVL image dime…#1253
sufubao wants to merge 1 commit intomainfrom
fix_mm_check

sufubao commented Apr 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 2, 2026

Uh oh!

gemini-code-assist bot Apr 2, 2026

Uh oh!

gemini-code-assist bot Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	del result
	del result, pixel_values, grid_thw
	torch.cuda.empty_cache()

	logger.exception(str(e))
	logger.exception("Qwen VL check max len infer failed")

	raise Exception(exception_str)
	raise RuntimeError(exception_str)

Conversation

sufubao commented Apr 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant