Docs/vllm ascend modelcar permissions by fyuan1316 · Pull Request #234 · alauda/aml-docs

fyuan1316 · 2026-05-26T00:33:02Z

Summary by CodeRabbit

Documentation

Updated installation guidance with latest Knative version and clarified Model Catalog password configuration steps.
Enhanced custom inference runtime documentation with detailed resource sizing guidance and Ascend 910 deployment mode specifications.
Expanded troubleshooting section with Ascend 910 vLLM-ascend permission mode guidance and cross-references.

The vLLM Ascend runtime needs USER and HOME at the ClusterServingRuntime layer so services inherit the non-root UID workaround consistently. The InferenceService example now only carries the device-access security context, and each YAML block has its own callout explanations. Constraint: torch_npu auto-load must remain enabled for Ascend NPU runtimes Rejected: Keep HOME on each InferenceService | duplicates runtime-owned behavior and leaves USER undocumented Confidence: high Scope-risk: narrow Tested: yarn lint

The runtime examples use fixed CPU and memory values only as placeholders. Each limit now has a callout so readers know to size the values for their model, runtime engine, hardware, and workload instead of copying the sample values blindly. Constraint: Keep the change documentation-only and avoid changing request values or runtime semantics Confidence: high Scope-risk: narrow Tested: yarn lint

The installation guide should present the current Knative Serving default and one clear Model Catalog password Secret workflow. The KnativeServing example now defaults to 1.19.6, and the Model Catalog section keeps only the user-created stringData Secret example while explaining that Kubernetes encodes it on creation. Constraint: Keep ACP 4.0 guidance as a note while defaulting the YAML to ACP 4.1+ Rejected: Show both stringData and stored data Secret manifests | makes users think two Secret definitions are required Confidence: high Scope-risk: narrow Tested: yarn lint

Document the cluster-level Modelcar UID tradeoff for Ascend 910 vLLM-ascend deployments so users can distinguish the platform default non-root path from the root compatibility path required by some multi-card HCCL scenarios. Constraint: Alauda AI defaults Modelcar to UID 1000 for the platform security baseline and common GPU workloads Constraint: Community vLLM-ascend single-card deployments validate under non-root mode, while multi-card HCCL paths may need root or a UID 1000 compatible image Rejected: Make root mode the default | it would overstate a compatibility workaround and obscure the cluster-wide security impact Confidence: medium Scope-risk: narrow Directive: Do not present root-mode Modelcar as a per-service switch; uidModelcar is cluster-level Tested: yarn lint Tested: git diff --check on changed docs Not-tested: Runtime validation on Ascend 910 hardware

coderabbitai · 2026-05-26T00:33:13Z

Walkthrough

This PR updates documentation across cluster installation, custom inference runtime guidance, and troubleshooting to improve clarity on Kubernetes configuration. Changes include a Knative version bump, clarification of Model Catalog secret setup, callout annotations for resource sizing across multiple runtimes, and comprehensive new guidance on vLLM-ascend security and permission modes for Ascend hardware.

Changes

Documentation Updates: Cluster Installation and Inference Runtime Guidance

Layer / File(s)	Summary
Cluster setup and Model Catalog configuration `docs/en/installation/ai-cluster.mdx`	Knative `KnativeServing` version updated to 1.19.6. Model Catalog secret namespace/name fields clarified with guidance to create the secret before instance creation. Kubernetes `stringData.password` encoding behavior documented to clarify no manual base64-encoding is required.
Resource sizing callouts for standard runtimes `docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx`	Xinference, MLServer, Triton, and MindIE YAML examples annotated with CPU/memory resource limit callout markers and corresponding explanation blocks added for each runtime.
vLLM-ascend permission modes and security configuration `docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx`	New "Modelcar Permission Modes for Ascend 910" subsection added with non-root vs root deployment guidance and cluster-level warning. vLLM-ascend ClusterServingRuntime YAML updated with HOME/USER environment variables and `runAsUser` 1000 with callout markers. InferenceService example reordered with updated resource limits and expanded fsGroup/supplementalGroups guidance. Runtime comparison table updated to reflect new requirements.
Troubleshooting updates and conclusion alignment `docs/en/model_inference/inference_service/how_to/using_modelcar.mdx`	Ascend 910 vLLM-ascend permission mode troubleshooting item added with link to permission modes documentation. Conclusion section consolidated to single continuous paragraph.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

alauda/aml-docs#210: Both PRs modify docs/en/installation/ai-cluster.mdx in the Model Catalog setup section around PostgreSQL password secret configuration.

Poem

🐰 A cluster takes shape, with Knative up high,
Ascend modes unfold beneath the K8s sky—
Resource callouts dance, from Xinference to vLLM,
Secrets encoded smooth, just add them and fill 'em!
Documentation blooms, let troubleshooting be clear,
The inference path brightens—deployment's right here! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly relates to the main changes in the pull request, which focus on documenting vLLM-ascend modelcar permission modes and Ascend-specific security configurations across multiple documentation files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/vllm-ascend-modelcar-permissions

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cloudflare-workers-and-pages · 2026-05-26T00:38:29Z

Deploying alauda-ai with Cloudflare Pages

Latest commit:	`2ad292c`
Status:	✅ Deploy successful!
Preview URL:	https://4ee37638.alauda-ai.pages.dev
Branch Preview URL:	https://docs-vllm-ascend-modelcar-pe.alauda-ai.pages.dev

View logs

coderabbitai

🧹 Nitpick comments (1)

docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (1)
1010-1010: 💤 Low value

Consider improving table cell readability.

The "Special Requirements" cell for vLLM-ascend is comprehensive but quite long (103 words). For better readability in the comparison table, consider either:
Breaking this into bullet points within the cell (if your MDX table renderer supports it), or
Using a shorter summary with a reference link to the detailed section, such as:
**Must** set `HOME` and `USER` environment variables and configure security context according to the [Modelcar permission mode](`#modelcar-permission-modes-for-ascend-910`) (UID 1000 group settings for non-root mode only)
The current content is accurate and complete, so this is an optional readability improvement.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx`
at line 1010, Shorten the long "Special Requirements" cell for vLLM-ascend in
the MDX table by either replacing the 103-word sentence with a concise summary
and a link to the detailed section (e.g., reference the "Modelcar permission
mode" anchor) or by splitting it into 2–3 bullet points if the table renderer
supports inline lists; keep key symbols referenced exactly
(ClusterServingRuntime, HOME, USER, Modelcar permission mode, UID 1000) so
readers can find the detailed instructions in the longer section.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx`:
- Line 1010: Shorten the long "Special Requirements" cell for vLLM-ascend in the
MDX table by either replacing the 103-word sentence with a concise summary and a
link to the detailed section (e.g., reference the "Modelcar permission mode"
anchor) or by splitting it into 2–3 bullet points if the table renderer supports
inline lists; keep key symbols referenced exactly (ClusterServingRuntime, HOME,
USER, Modelcar permission mode, UID 1000) so readers can find the detailed
instructions in the longer section.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0eeccdd4-401a-43aa-b31c-b62754a2ff18

📥 Commits

Reviewing files that changed from the base of the PR and between c766bc1 and 2ad292c.

📒 Files selected for processing (3)

docs/en/installation/ai-cluster.mdx
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
docs/en/model_inference/inference_service/how_to/using_modelcar.mdx

fyuan1316 added 4 commits May 26, 2026 08:01

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs/vllm ascend modelcar permissions#234

Docs/vllm ascend modelcar permissions#234
fyuan1316 wants to merge 4 commits into
masterfrom
docs/vllm-ascend-modelcar-permissions

fyuan1316 commented May 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented May 26, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fyuan1316 commented May 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Documentation

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

cloudflare-workers-and-pages Bot commented May 26, 2026

Deploying alauda-ai with Cloudflare Pages

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fyuan1316 commented May 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading