Skip to content

ROCm/device-metrics-exporter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AMD Device Metrics Exporter

AMD Device Metrics Exporter enables real-time collection of telemetry data in Prometheus format from AMD GPUs in HPC and AI environments. It provides comprehensive metrics including temperature, utilization, memory usage, power consumption, and more.

Features

  • Prometheus-compatible metrics endpoint
  • Rich GPU telemetry data including:
    • Temperature monitoring
    • Utilization metrics
    • Memory usage statistics
    • Power consumption data
    • PCIe bandwidth metrics
    • Performance metrics
  • Kubernetes integration via Helm chart
  • Slurm integration support
  • Configurable service ports
  • Container-based deployment

Documentation

For detailed documentation including installation guides, configuration options, and metric descriptions, see the documentation.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

About

Device Metrics Exporter exports metrics from AMD devices (GPUs) to collectors like Prometheus.

Resources

License

Stars

Watchers

Forks

Packages

No packages published