Docs/vllm ascend modelcar permissions#234
Conversation
The vLLM Ascend runtime needs USER and HOME at the ClusterServingRuntime layer so services inherit the non-root UID workaround consistently. The InferenceService example now only carries the device-access security context, and each YAML block has its own callout explanations. Constraint: torch_npu auto-load must remain enabled for Ascend NPU runtimes Rejected: Keep HOME on each InferenceService | duplicates runtime-owned behavior and leaves USER undocumented Confidence: high Scope-risk: narrow Tested: yarn lint
The runtime examples use fixed CPU and memory values only as placeholders. Each limit now has a callout so readers know to size the values for their model, runtime engine, hardware, and workload instead of copying the sample values blindly. Constraint: Keep the change documentation-only and avoid changing request values or runtime semantics Confidence: high Scope-risk: narrow Tested: yarn lint
The installation guide should present the current Knative Serving default and one clear Model Catalog password Secret workflow. The KnativeServing example now defaults to 1.19.6, and the Model Catalog section keeps only the user-created stringData Secret example while explaining that Kubernetes encodes it on creation. Constraint: Keep ACP 4.0 guidance as a note while defaulting the YAML to ACP 4.1+ Rejected: Show both stringData and stored data Secret manifests | makes users think two Secret definitions are required Confidence: high Scope-risk: narrow Tested: yarn lint
Document the cluster-level Modelcar UID tradeoff for Ascend 910 vLLM-ascend deployments so users can distinguish the platform default non-root path from the root compatibility path required by some multi-card HCCL scenarios. Constraint: Alauda AI defaults Modelcar to UID 1000 for the platform security baseline and common GPU workloads Constraint: Community vLLM-ascend single-card deployments validate under non-root mode, while multi-card HCCL paths may need root or a UID 1000 compatible image Rejected: Make root mode the default | it would overstate a compatibility workaround and obscure the cluster-wide security impact Confidence: medium Scope-risk: narrow Directive: Do not present root-mode Modelcar as a per-service switch; uidModelcar is cluster-level Tested: yarn lint Tested: git diff --check on changed docs Not-tested: Runtime validation on Ascend 910 hardware
WalkthroughThis PR updates documentation across cluster installation, custom inference runtime guidance, and troubleshooting to improve clarity on Kubernetes configuration. Changes include a Knative version bump, clarification of Model Catalog secret setup, callout annotations for resource sizing across multiple runtimes, and comprehensive new guidance on vLLM-ascend security and permission modes for Ascend hardware. ChangesDocumentation Updates: Cluster Installation and Inference Runtime Guidance
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Deploying alauda-ai with
|
| Latest commit: |
2ad292c
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://4ee37638.alauda-ai.pages.dev |
| Branch Preview URL: | https://docs-vllm-ascend-modelcar-pe.alauda-ai.pages.dev |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (1)
1010-1010: 💤 Low valueConsider improving table cell readability.
The "Special Requirements" cell for vLLM-ascend is comprehensive but quite long (103 words). For better readability in the comparison table, consider either:
- Breaking this into bullet points within the cell (if your MDX table renderer supports it), or
- Using a shorter summary with a reference link to the detailed section, such as:
**Must** set `HOME` and `USER` environment variables and configure security context according to the [Modelcar permission mode](`#modelcar-permission-modes-for-ascend-910`) (UID 1000 group settings for non-root mode only)The current content is accurate and complete, so this is an optional readability improvement.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx` at line 1010, Shorten the long "Special Requirements" cell for vLLM-ascend in the MDX table by either replacing the 103-word sentence with a concise summary and a link to the detailed section (e.g., reference the "Modelcar permission mode" anchor) or by splitting it into 2–3 bullet points if the table renderer supports inline lists; keep key symbols referenced exactly (ClusterServingRuntime, HOME, USER, Modelcar permission mode, UID 1000) so readers can find the detailed instructions in the longer section.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx`:
- Line 1010: Shorten the long "Special Requirements" cell for vLLM-ascend in the
MDX table by either replacing the 103-word sentence with a concise summary and a
link to the detailed section (e.g., reference the "Modelcar permission mode"
anchor) or by splitting it into 2–3 bullet points if the table renderer supports
inline lists; keep key symbols referenced exactly (ClusterServingRuntime, HOME,
USER, Modelcar permission mode, UID 1000) so readers can find the detailed
instructions in the longer section.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0eeccdd4-401a-43aa-b31c-b62754a2ff18
📒 Files selected for processing (3)
docs/en/installation/ai-cluster.mdxdocs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdxdocs/en/model_inference/inference_service/how_to/using_modelcar.mdx
Summary by CodeRabbit
Documentation