-
Notifications
You must be signed in to change notification settings - Fork 113
Add SageMaker AI Knowledge Base Power #80
Copy link
Copy link
Open
Description
Proposal: Add sagemaker-ai Knowledge Base Power
What
A Knowledge Base Power (no MCP servers) providing battle-tested guidance for deploying and training ML models on Amazon SageMaker AI.
Coverage
- Inference endpoints — Container selection (DJL LMI, vLLM, HuggingFace DLCs), CUDA compatibility, SDK v3 deployment patterns
- Model training — Serverless customization, Training Jobs (QLoRA/LoRA), HyperPod clusters, GPU vs Trainium instance sizing
- HyperPod — Cluster setup (EKS, Training Operator, Task Governance) and inference deployment (JumpStart, custom models, autoscaling)
- Model Monitor — All 4 monitor types (Data Quality, Model Quality, Bias, Explainability) with SDK v3 patterns and known bug workarounds
- AutoML — AutoGluon (tabular, time series, multimodal) with SageMaker Pipelines integration
- SDK v3 reference — Correct imports, guardrails table to prevent common v2→v3 migration mistakes
Structure
sagemaker-ai/
├── POWER.md
└── steering/
├── inference-endpoints.md
├── training-jobs.md
├── hyperpod.md
├── hyperpod-inference.md
├── model-monitor.md
├── automl-autogluon.md
└── sdk-v3-reference.md
Why
SageMaker Python SDK v3 introduced breaking changes (new import paths, removed classes like JumpStartModel, new Pipeline APIs). Without explicit steering, AI agents consistently generate v2 code that fails at runtime. This power prevents those mistakes with an always-on guardrails table and scenario-specific steering files.
All content is derived from real deployment experience and includes troubleshooting for known SDK bugs and DLC compatibility issues.
PR
See #81 (will update after PR creation)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels