AWS Deep Learning Containers MCP Server

A Model Context Protocol (MCP) server for AWS Deep Learning Containers (DLC) that provides tools for discovering, building, deploying, and troubleshooting DLC images.

Features

Dynamic DLC Image Discovery: Automatically fetches latest images from AWS DLC GitHub - always up-to-date
Image Building: Create custom Dockerfiles and build images based on DLC base images
Multi-Platform Deployment: Deploy to SageMaker, EC2, ECS, and EKS
Instance Recommendations: Get GPU instance recommendations based on model size and budget
Upgrade Support: Analyze upgrade paths and generate migration Dockerfiles
Troubleshooting: Diagnose common DLC issues with actionable solutions
Best Practices: Security, cost optimization, and deployment guidance
No AWS Credentials Required: Discovery tools work without AWS credentials

Quick Start

Option 1: Run with uv (Recommended)

# Clone the repo
git clone https://github.com/aws-samples/sample-dlc-mcp-server.git
cd sample-dlc-mcp-server

# Run directly with uv
uv run dlc-mcp-server

Option 2: Run with Docker

# Build the image
docker build -t dlc-mcp-server .

# Run the container
docker run -it --rm \
  -v ~/.aws:/root/.aws:ro \
  dlc-mcp-server

Option 3: Install locally

pip install -e .
dlc-mcp-server

MCP Client Configuration

For Amazon Q CLI

Add to ~/.aws/amazonq/mcp.json:

{
  "mcpServers": {
    "dlc-mcp-server": {
      "command": "uv",
      "args": ["--directory", "/path/to/sample-dlc-mcp-server", "run", "dlc-mcp-server"],
      "timeout": 120000
    }
  }
}

For Kiro

Add to .kiro/settings/mcp.json:

{
  "mcpServers": {
    "dlc-mcp-server": {
      "command": "uv",
      "args": ["--directory", "/path/to/sample-dlc-mcp-server", "run", "dlc-mcp-server"],
      "timeout": 120000
    }
  }
}

Using Docker

{
  "mcpServers": {
    "dlc-mcp-server": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "-v", "~/.aws:/root/.aws:ro", "dlc-mcp-server"],
      "timeout": 120000
    }
  }
}

Available Tools

DLC Discovery

Tool	Description
`search_dlc_images`	Search DLC images by framework, version, accelerator, platform
`get_dlc_recommendation`	Get image recommendations based on model type and size
`list_dlc_frameworks`	List all available frameworks with versions
`get_llm_serving_options`	Compare vLLM, SGLang, DJL, NeuronX options
`compare_dlc_images`	Side-by-side image comparison
`refresh_dlc_catalog`	Force refresh image catalog from GitHub

Image Building

Tool	Description
`create_custom_dockerfile`	Generate Dockerfile with custom packages
`build_custom_dlc_image`	Build and optionally push to ECR

Deployment

Tool	Description
`deploy_to_sagemaker`	Deploy to SageMaker endpoint
`deploy_to_ec2`	Launch EC2 instance with DLC
`deploy_to_ecs`	Deploy to ECS cluster
`deploy_to_eks`	Deploy to EKS cluster
`get_sagemaker_endpoint_status`	Check endpoint status

Instance Advisor

Tool	Description
`get_instance_recommendation`	GPU instance recommendations by model size
`list_gpu_instances`	List available GPU instances with pricing
`estimate_training_cost`	Estimate training job costs

Troubleshooting

Tool	Description
`analyze_dlc_error`	Analyze error logs with root cause analysis
`diagnose_common_issues`	Diagnose common DLC problems
`get_framework_compatibility_info`	Check framework version compatibility

Best Practices

Tool	Description
`get_security_best_practices`	Security guidelines
`get_cost_optimization_tips`	Cost reduction strategies
`get_deployment_best_practices`	Platform-specific guidance
`get_framework_specific_best_practices`	Framework optimization tips

Supported Frameworks

Framework	Latest Version	Use Cases
PyTorch	2.9.0	Training, Inference
TensorFlow	2.19.0	Training, Inference
vLLM	0.15.1	LLM Inference
SGLang	0.5.8	LLM Inference
HuggingFace PyTorch	2.6.0	NLP Training/Inference
AutoGluon	1.5.0	AutoML
DJL	0.36.0	Large Model Inference
PyTorch NeuronX	2.9.0	Trainium/Inferentia

Example Usage

Find vLLM images

Search for vLLM images for SageMaker inference

Deploy LLM to SageMaker

Deploy Qwen2.5-32B using vLLM on SageMaker with the right instance type

Get instance recommendations

What instance should I use for a 35GB model?

Troubleshoot errors

Help me fix this CUDA out of memory error: [paste error]

Configuration

Environment variables:

Variable	Description	Default
`ALLOW_WRITE`	Enable build/deploy operations	`false`
`ALLOW_SENSITIVE_DATA`	Enable detailed logs access	`false`
`FASTMCP_LOG_LEVEL`	Logging level	`ERROR`
`FASTMCP_LOG_FILE`	Log file path	None

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
python -m pytest tests/ -v

# Run linting
ruff check .

See DEVELOPMENT.md for more details.

License

This library is licensed under the MIT-0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
aws_samples		aws_samples
examples		examples
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWS Deep Learning Containers MCP Server

Features

Quick Start

Option 1: Run with uv (Recommended)

Option 2: Run with Docker

Option 3: Install locally

MCP Client Configuration

For Amazon Q CLI

For Kiro

Using Docker

Available Tools

DLC Discovery

Image Building

Deployment

Instance Advisor

Troubleshooting

Best Practices

Supported Frameworks

Example Usage

Find vLLM images

Deploy LLM to SageMaker

Get instance recommendations

Troubleshoot errors

Configuration

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AWS Deep Learning Containers MCP Server

Features

Quick Start

Option 1: Run with uv (Recommended)

Option 2: Run with Docker

Option 3: Install locally

MCP Client Configuration

For Amazon Q CLI

For Kiro

Using Docker

Available Tools

DLC Discovery

Image Building

Deployment

Instance Advisor

Troubleshooting

Best Practices

Supported Frameworks

Example Usage

Find vLLM images

Deploy LLM to SageMaker

Get instance recommendations

Troubleshoot errors

Configuration

Development

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages