Skip to content

Conversation

@YanhuiDua
Copy link
Collaborator

@YanhuiDua YanhuiDua commented Nov 7, 2025

用户配置并发度的方式为以rollout_max_batch_per_instance为基准,从而配置数据流等并发度。配置文档见 xtuner/docs/zh_cn/rl/advanced_tutorial/efficiency.md

可能的几种情况如下:

  1. 不提供rollout_max_batch_size_per_instance,xtuner根据context_length获取建议推理引擎的并发度
  2. 不提供DataFlowConfig.max_concurrent,xtuner根据rollout_max_batch_size_per_instance, 推理引擎实例数,prompt_repeat_k,超参来计算max_concurrent
  3. 用户提供rollout_max_batch_size_per_instance和DataFlowConfig.max_concurrent,使用用户提供的值,

@YanhuiDua YanhuiDua force-pushed the update_rollout_worker branch from 7a90af8 to f0bc0f8 Compare November 7, 2025 10:03
@YanhuiDua YanhuiDua changed the title [feat] update "rollout_max_batch_size" to replace "max_concurrent" for user settings update "rollout_max_batch_size" to replace "max_concurrent" for user settings Nov 7, 2025
]
self.dataloader_cfg = DataloaderConfig(
pack_max_length=self.max_prompt_length,
pack_max_length=self.max_prompt_length+self.max_response_length,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删掉

@YanhuiDua YanhuiDua force-pushed the update_rollout_worker branch from 2890ff6 to e7b7f5b Compare November 7, 2025 12:09
@RangiLyu
Copy link
Contributor

RangiLyu commented Nov 7, 2025

rollout worker的http_concurrency需要能够大于rollout_max_batch_size,现在还是根据rollout_max_batch_size算的

@RangiLyu
Copy link
Contributor

RangiLyu commented Nov 7, 2025

rollout worker的http_concurrency需要能够大于rollout_max_batch_size,现在还是根据rollout_max_batch_size算的

max_concurrent和http_concurrency可以绑定,这两个参数是客户端的请求数,而rollout_max_batch_size是推理引擎端的参数,这二者需要分开配置,否则GPU功率打不满

api_key (Optional[Union[List[str], str]]): API keys for rollout service.
Supports single key or list of keys. Defaults to None.
api_key (Optional[Union[List[str], str]]): API keys for rollout service. åSupports single key or list of keys. Defaults to None.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix typo before supports

@YanhuiDua YanhuiDua merged commit 0b9f1e8 into InternLM:main Nov 10, 2025
4 of 5 checks passed
@YanhuiDua
Copy link
Collaborator Author

rollout worker的http_concurrency需要能够大于rollout_max_batch_size,现在还是根据rollout_max_batch_size算的

done, 通过 ttp_concurrency = config.rollout_max_batch_size_per_instance * config.allow_over_concurrency_ratio 来配置

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants