Skip to content

Docs/vllm ascend modelcar permissions#234

Open
fyuan1316 wants to merge 4 commits into
masterfrom
docs/vllm-ascend-modelcar-permissions
Open

Docs/vllm ascend modelcar permissions#234
fyuan1316 wants to merge 4 commits into
masterfrom
docs/vllm-ascend-modelcar-permissions

Conversation

@fyuan1316
Copy link
Copy Markdown
Contributor

@fyuan1316 fyuan1316 commented May 26, 2026

Summary by CodeRabbit

Documentation

  • Updated installation guidance with latest Knative version and clarified Model Catalog password configuration steps.
  • Enhanced custom inference runtime documentation with detailed resource sizing guidance and Ascend 910 deployment mode specifications.
  • Expanded troubleshooting section with Ascend 910 vLLM-ascend permission mode guidance and cross-references.

Review Change Stack

fyuan1316 added 4 commits May 26, 2026 08:01
The vLLM Ascend runtime needs USER and HOME at the ClusterServingRuntime layer so services inherit the non-root UID workaround consistently. The InferenceService example now only carries the device-access security context, and each YAML block has its own callout explanations.

Constraint: torch_npu auto-load must remain enabled for Ascend NPU runtimes

Rejected: Keep HOME on each InferenceService | duplicates runtime-owned behavior and leaves USER undocumented

Confidence: high

Scope-risk: narrow

Tested: yarn lint
The runtime examples use fixed CPU and memory values only as placeholders. Each limit now has a callout so readers know to size the values for their model, runtime engine, hardware, and workload instead of copying the sample values blindly.

Constraint: Keep the change documentation-only and avoid changing request values or runtime semantics

Confidence: high

Scope-risk: narrow

Tested: yarn lint
The installation guide should present the current Knative Serving default and one clear Model Catalog password Secret workflow. The KnativeServing example now defaults to 1.19.6, and the Model Catalog section keeps only the user-created stringData Secret example while explaining that Kubernetes encodes it on creation.

Constraint: Keep ACP 4.0 guidance as a note while defaulting the YAML to ACP 4.1+

Rejected: Show both stringData and stored data Secret manifests | makes users think two Secret definitions are required

Confidence: high

Scope-risk: narrow

Tested: yarn lint
Document the cluster-level Modelcar UID tradeoff for Ascend 910 vLLM-ascend deployments so users can distinguish the platform default non-root path from the root compatibility path required by some multi-card HCCL scenarios.

Constraint: Alauda AI defaults Modelcar to UID 1000 for the platform security baseline and common GPU workloads
Constraint: Community vLLM-ascend single-card deployments validate under non-root mode, while multi-card HCCL paths may need root or a UID 1000 compatible image
Rejected: Make root mode the default | it would overstate a compatibility workaround and obscure the cluster-wide security impact
Confidence: medium
Scope-risk: narrow
Directive: Do not present root-mode Modelcar as a per-service switch; uidModelcar is cluster-level
Tested: yarn lint
Tested: git diff --check on changed docs
Not-tested: Runtime validation on Ascend 910 hardware
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Walkthrough

This PR updates documentation across cluster installation, custom inference runtime guidance, and troubleshooting to improve clarity on Kubernetes configuration. Changes include a Knative version bump, clarification of Model Catalog secret setup, callout annotations for resource sizing across multiple runtimes, and comprehensive new guidance on vLLM-ascend security and permission modes for Ascend hardware.

Changes

Documentation Updates: Cluster Installation and Inference Runtime Guidance

Layer / File(s) Summary
Cluster setup and Model Catalog configuration
docs/en/installation/ai-cluster.mdx
Knative KnativeServing version updated to 1.19.6. Model Catalog secret namespace/name fields clarified with guidance to create the secret before instance creation. Kubernetes stringData.password encoding behavior documented to clarify no manual base64-encoding is required.
Resource sizing callouts for standard runtimes
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
Xinference, MLServer, Triton, and MindIE YAML examples annotated with CPU/memory resource limit callout markers and corresponding explanation blocks added for each runtime.
vLLM-ascend permission modes and security configuration
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
New "Modelcar Permission Modes for Ascend 910" subsection added with non-root vs root deployment guidance and cluster-level warning. vLLM-ascend ClusterServingRuntime YAML updated with HOME/USER environment variables and runAsUser 1000 with callout markers. InferenceService example reordered with updated resource limits and expanded fsGroup/supplementalGroups guidance. Runtime comparison table updated to reflect new requirements.
Troubleshooting updates and conclusion alignment
docs/en/model_inference/inference_service/how_to/using_modelcar.mdx
Ascend 910 vLLM-ascend permission mode troubleshooting item added with link to permission modes documentation. Conclusion section consolidated to single continuous paragraph.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • alauda/aml-docs#210: Both PRs modify docs/en/installation/ai-cluster.mdx in the Model Catalog setup section around PostgreSQL password secret configuration.

Poem

🐰 A cluster takes shape, with Knative up high,
Ascend modes unfold beneath the K8s sky—
Resource callouts dance, from Xinference to vLLM,
Secrets encoded smooth, just add them and fill 'em!
Documentation blooms, let troubleshooting be clear,
The inference path brightens—deployment's right here! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly relates to the main changes in the pull request, which focus on documenting vLLM-ascend modelcar permission modes and Ascend-specific security configurations across multiple documentation files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/vllm-ascend-modelcar-permissions

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying alauda-ai with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2ad292c
Status: ✅  Deploy successful!
Preview URL: https://4ee37638.alauda-ai.pages.dev
Branch Preview URL: https://docs-vllm-ascend-modelcar-pe.alauda-ai.pages.dev

View logs

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (1)

1010-1010: 💤 Low value

Consider improving table cell readability.

The "Special Requirements" cell for vLLM-ascend is comprehensive but quite long (103 words). For better readability in the comparison table, consider either:

  1. Breaking this into bullet points within the cell (if your MDX table renderer supports it), or
  2. Using a shorter summary with a reference link to the detailed section, such as:
    **Must** set `HOME` and `USER` environment variables and configure security context according to the [Modelcar permission mode](`#modelcar-permission-modes-for-ascend-910`) (UID 1000 group settings for non-root mode only)
    

The current content is accurate and complete, so this is an optional readability improvement.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx`
at line 1010, Shorten the long "Special Requirements" cell for vLLM-ascend in
the MDX table by either replacing the 103-word sentence with a concise summary
and a link to the detailed section (e.g., reference the "Modelcar permission
mode" anchor) or by splitting it into 2–3 bullet points if the table renderer
supports inline lists; keep key symbols referenced exactly
(ClusterServingRuntime, HOME, USER, Modelcar permission mode, UID 1000) so
readers can find the detailed instructions in the longer section.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx`:
- Line 1010: Shorten the long "Special Requirements" cell for vLLM-ascend in the
MDX table by either replacing the 103-word sentence with a concise summary and a
link to the detailed section (e.g., reference the "Modelcar permission mode"
anchor) or by splitting it into 2–3 bullet points if the table renderer supports
inline lists; keep key symbols referenced exactly (ClusterServingRuntime, HOME,
USER, Modelcar permission mode, UID 1000) so readers can find the detailed
instructions in the longer section.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0eeccdd4-401a-43aa-b31c-b62754a2ff18

📥 Commits

Reviewing files that changed from the base of the PR and between c766bc1 and 2ad292c.

📒 Files selected for processing (3)
  • docs/en/installation/ai-cluster.mdx
  • docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
  • docs/en/model_inference/inference_service/how_to/using_modelcar.mdx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant