Skip to content

Conversation

e-mon
Copy link
Collaborator

@e-mon e-mon commented Aug 9, 2025

gpt-ossを利用可能にするため、transformers, vllmのバージョンをアップデートしました

@namgiH
Copy link
Collaborator

namgiH commented Aug 9, 2025

mdx で試していたのですが、gpt-oss-20b を使うことに失敗しています。
まず以下のエラーが出て、

Traceback (most recent call last):
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/inference.py", line 116, in <module>
    cfg = setup_cli(
          ^^^^^^^^^^
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/llm_jp_eval/cli.py", line 76, in setup_cli
    commands[args.command](base_model.model_validate(parsed_settings.model_dump()))
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/inference.py", line 26, in inference
    generator.main()
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/src/llm_jp_eval_inference/generator.py", line 324, in main
    self.execute()
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/src/llm_jp_eval_inference/generator.py", line 213, in execute
    self.load_model(dataset_profiles)
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/inference.py", line 78, in load_model
    self.model: vllm.LLM = vllm.LLM(**self.cfg.model.model_dump())
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 277, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 487, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1088, in create_engine_config
    cache_config = CacheConfig(
                   ^^^^^^^^^^^^
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/pydantic/_internal/_dataclasses.py", line 123, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/config.py", line 1874, in __post_init__
    self._verify_prefix_caching()
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/config.py", line 1916, in _verify_prefix_caching
    raise NotImplementedError(
NotImplementedError: Prefix caching is not supported with sliding window. Run with --disable-sliding-window to use prefix caching.

それで disable_sliding_window: Trueconfigs/vllm_inference.yaml に入れると、
今度は以下のエラーが出ます。

Traceback (most recent call last):
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/inference.py", line 116, in <module>
    cfg = setup_cli(
          ^^^^^^^^^^
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/llm_jp_eval/cli.py", line 76, in setup_cli
    commands[args.command](base_model.model_validate(parsed_settings.model_dump()))
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/inference.py", line 26, in inference
    generator.main()
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/src/llm_jp_eval_inference/generator.py", line 324, in main
    self.execute()
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/src/llm_jp_eval_inference/generator.py", line 213, in execute
    self.load_model(dataset_profiles)
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/inference.py", line 78, in load_model
    self.model: vllm.LLM = vllm.LLM(**self.cfg.model.model_dump())
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 277, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 487, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1043, in create_engine_config
    model_config = self.create_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 889, in create_model_config
    return ModelConfig(
           ^^^^^^^^^^^^
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/pydantic/_internal/_dataclasses.py", line 123, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/config.py", line 782, in __post_init__
    self.max_model_len = self.get_and_verify_max_len(self.max_model_len)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/config.py", line 1751, in get_and_verify_max_len
    max_model_len = _get_and_verify_max_len(
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/model/hng88/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/config.py", line 3727, in _get_and_verify_max_len
    raise NotImplementedError(
NotImplementedError: Disabling sliding window is not supported for models with rope_scaling. Please raise an issue so we can investigate.

また、llm-jp-eval-inference/inference-modules/vllm で vLLM の公式文書にあった
VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1 uv run vllm serve openai/gpt-oss-20b --async-schedulingを打っても、
全く他のエラーが出てうまくいきませんでした。

ただ、これは mdx の環境問題である可能性も排除できませんので、
堀江さんの環境で問題ありませんでしたら、これはマージしても良いと思いますが、いかがでしょうか?

@namgiH
Copy link
Collaborator

namgiH commented Aug 16, 2025

こちら、今 enable_prefix_caching=False を入れて試してる途中ですが、以下のエラーで出来ないように見えます:

[rank0]: Traceback (most recent call last):                                                                                                                                                                                                                                    [139/817][rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/inference.py", line 116, in <module>                                                                                                                                              [rank0]:     cfg = setup_cli(
[rank0]:           ^^^^^^^^^^                                                                                                                                                                                                                                                           [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/llm_jp_eval/cli.py", line 76, in setup_cli                                                                                                     [rank0]:     commands[args.command](base_model.model_validate(parsed_settings.model_dump()))                                                                                                                                                                                            [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/inference.py", line 26, in inference                                                                                                                                              [rank0]:     generator.main()                                                                                                                                                                                                                                                           [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/src/llm_jp_eval_inference/generator.py", line 324, in main                                                                                                                                               [rank0]:     self.execute()                                                                                                                                                                                                                                                             [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/src/llm_jp_eval_inference/generator.py", line 213, in execute                                                                                                                                            [rank0]:     self.load_model(dataset_profiles)                                                                                                                                                                                                                                          [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/inference.py", line 78, in load_model                                                                                                                                             [rank0]:     self.model: vllm.LLM = vllm.LLM(**self.cfg.model.model_dump())                                                                                                                                                                                                             [rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                             [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 277, in __init__                                                                                                [rank0]:     self.llm_engine = LLMEngine.from_engine_args(                                                                                                                                                                                                                              [rank0]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                              [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 494, in from_engine_args                                                                                      [rank0]:     return engine_cls.from_vllm_config(                                                                                                                                                                                                                                        [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                        [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 470, in from_vllm_config                                                                                      [rank0]:     return cls(                                                                                                                                                                                                                                                                [rank0]:            ^^^^                                                                                                                                                                                                                                                                [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 260, in __init__                                                                                              [rank0]:     self.model_executor = executor_class(vllm_config=vllm_config)
[rank0]:                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                              [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 264, in __init__                                                                                         [rank0]:     super().__init__(*args, **kwargs)                                                                                                                                                                                                                                          [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__                                                                                          [rank0]:     self._init_executor()                                                                                                                                                                                                                                                      [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 126, in _init_executor                                                                         [rank0]:     self._run_workers("load_model",                                                                                                                                                                                                                                            [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 186, in _run_workers                                                                           [rank0]:     driver_worker_output = run_method(self.driver_worker, sent_method,                                                                                                                                                                                                         [rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                         [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 2948, in run_method                                                                                              [rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                               [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/worker/worker.py", line 211, in load_model                                                                                                [rank0]:     self.model_runner.load_model()                                                                                                                                                                                                                                             [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 650, in load_model                                                                               [rank0]:     self._base_model_runner.load_model()                                                                                                                                                                                                                                       [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/worker/model_runner.py", line 1087, in load_model                                                                                         [rank0]:     self.model = get_model(vllm_config=self.vllm_config)                                                                                                                                                                                                                       [rank0]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                       [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py", line 118, in get_model                                                                          [rank0]:     return loader.load_model(vllm_config=vllm_config,
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                          [rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 44, in load_model                                                                       [rank0]:     model = initialize_model(vllm_config=vllm_config,
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
[rank0]:     return model_class(vllm_config=vllm_config, prefix=prefix)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/model_executor/models/gpt_oss.py", line 241, in __init__
[rank0]:     self.model = GptOssModel(
[rank0]:                  ^^^^^^^^^^^^
[rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 183, in __init__
[rank0]:     old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
[rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/model_executor/models/gpt_oss.py", line 214, in __init__                                                                                  [rank0]:     TransformerBlock(
[rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/model_executor/models/gpt_oss.py", line 183, in __init__                                                                                  [rank0]:     self.attn = OAIAttention(config, prefix=f"{prefix}.attn")
[rank0]:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/model_executor/models/gpt_oss.py", line 110, in __init__
[rank0]:     self.attn = Attention(
[rank0]:                 ^^^^^^^^^^
[rank0]:   File "/model/hng88/llm-jp-cluster/llm-jp-eval/llm-jp-eval-inference/inference-modules/vllm/.venv/lib/python3.12/site-packages/vllm/attention/layer.py", line 176, in __init__                                                                                                [rank0]:     self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
[rank0]:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: TypeError: FlashAttentionImpl.__init__() got an unexpected keyword argument 'sinks'

念のためですが、堀江さんの環境では問題なく動いてますでしょうか?
一旦こちらで使える環境を他に試してみたいと思いますが、
mdx の VM でだけ動かないのであれば、この PR はマージしようと思います。
この後、何か進捗ありましたらまた共有いたします。

@e-mon
Copy link
Collaborator Author

e-mon commented Oct 11, 2025

@namgiH
mdx環境にて以下の設定でモデルが動作することを確認しました

uv run vllm serve openai/gpt-oss-20b --port 10101 --tensor-parallel-size 4 --gpu-memory-utilization 0.7

@e-mon e-mon requested a review from namgiH October 11, 2025 12:42
@namgiH
Copy link
Collaborator

namgiH commented Oct 12, 2025

こちら、いつものように task eval_inference inference_config=configs/vllm_inference.yaml eval_config=configs/config.yaml で検証していましたが、vllm のバージョンアップで、python から生成させる場合の使い方に少し変更点があるようです:

  • inference-modules/vllm/schemas.py で、18行の num_scheduler_steps: int = 8 が使わなくなっているようで、知らない Argument というエラーが出ていました。
    • コメントアウトでエラーがなくなっているのを確認しています
  • inference-modules/vllm/inference.py で、103行の outputs = self.model.generate(sampling_params=sampling_params, prompt_token_ids=prompt_tokens) ですが、どうやら prompt_token_ids が deprecated されたようです

お手数ですが、上記の点をご確認の上、同じ問題が発見されましたら修正をお願いいたします 🙇

@namgiH
Copy link
Collaborator

namgiH commented Oct 14, 2025

このPRとは直接関係がありませんが、
検証の途中で uv sync だけだと triton のバージョンが正式バージョンにならない時がややあって、
(自分の場合は triton==3.4.0+git663e04e8 でした)
その場合、vllm 本家でも報告されている以下の問題が発生する時がありました:
vllm-project/vllm#12219

念のため、pyproject.toml に triton のバージョンを指定させてはいかがでしょうか?
3.5.0 では問題なく起動することを確認しています。

ただ、正直 llm-jp-eval-inference 側の問題ではない気がするので、
未対応でも良い気はしていて、もし未対応にされたい場合はこのままマージでも良いと思います。
よろしくお願いいたします。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants