Skip to content

Latest commit

ย 

History

History
464 lines (386 loc) ยท 21.4 KB

README.en.md

File metadata and controls

464 lines (386 loc) ยท 21.4 KB

English | ็ฎ€ไฝ“ไธญๆ–‡

GitHub License GitHub Release GitHub Repo Stars Linux Arch NVIDIA


๐Ÿš€ TensorRT-YOLO is an easy-to-use, extremely efficient inference deployment tool for the YOLO series designed specifically for NVIDIA devices. The project not only integrates TensorRT plugins to enhance post-processing but also utilizes CUDA kernels and CUDA graphs to accelerate inference. TensorRT-YOLO provides support for both C++ and Python inference, aiming to deliver a ๐Ÿ“ฆout-of-the-box deployment experience. It covers various task scenarios such as object detection, instance segmentation, image classification, pose estimation, oriented object detection, and video analysis, meeting developers' deployment needs in multiple scenarios.

๐ŸŒ  Recent updates

โœจ Key Features

๐ŸŽฏ Diverse YOLO Support

  • Comprehensive Compatibility: Supports YOLOv3 to YOLOv11 series models, as well as PP-YOLOE and PP-YOLOE+, meeting diverse needs.
  • Flexible Switching: Provides simple and easy-to-use interfaces for quick switching between different YOLO versions. ๐ŸŒŸ NEW
  • Multi-Scenario Applications: Offers rich example codes covering Detect, Segment, Classify, Pose, OBB, and more.

๐Ÿš€ Performance Optimization

  • CUDA Acceleration: Optimizes pre-processing through CUDA kernels and accelerates inference using CUDA graphs.
  • TensorRT Integration: Deeply integrates TensorRT plugins to significantly speed up post-processing and improve overall inference efficiency.
  • Multi-Context Inference: Supports multi-context parallel inference to maximize hardware resource utilization. ๐ŸŒŸ NEW
  • Memory Management Optimization: Adapts multi-architecture memory optimization strategies (e.g., Zero Copy mode for Jetson) to enhance memory efficiency. ๐ŸŒŸ NEW

๐Ÿ› ๏ธ Usability

  • Out-of-the-Box: Provides comprehensive C++ and Python inference support to meet different developers' needs.
  • CLI Tools: Built-in command-line tools for quick model export and inference, improving development efficiency.
  • Docker Support: Offers one-click Docker deployment solutions to simplify environment configuration and deployment processes.
  • No Third-Party Dependencies: All functionalities are implemented using standard libraries, eliminating the need for additional dependencies and simplifying deployment.
  • Easy Deployment: Provides dynamic library compilation support for easy calling and deployment.

๐ŸŒ Compatibility

  • Multi-Platform Support: Fully compatible with various operating systems and hardware platforms, including Windows, Linux, ARM, and x86.
  • TensorRT Compatibility: Perfectly adapts to TensorRT 10.x versions, ensuring seamless integration with the latest technology ecosystem.

๐Ÿ”ง Flexible Configuration

  • Customizable Preprocessing Parameters: Supports flexible configuration of various preprocessing parameters, including channel swapping (SwapRB), normalization parameters, and border padding. ๐ŸŒŸ NEW

๐Ÿš€ Performance

Model Official + trtexec (ms) trtyolo + trtexec (ms) TensorRT-YOLO Inference (ms)
YOLOv11n 1.611 ยฑ 0.061 1.428 ยฑ 0.097 1.228 ยฑ 0.048
YOLOv11s 2.055 ยฑ 0.147 1.886 ยฑ 0.145 1.687 ยฑ 0.047
YOLOv11m 3.028 ยฑ 0.167 2.865 ยฑ 0.235 2.691 ยฑ 0.085
YOLOv11l 3.856 ยฑ 0.287 3.682 ยฑ 0.309 3.571 ยฑ 0.102
YOLOv11x 6.377 ยฑ 0.487 6.195 ยฑ 0.482 6.207 ยฑ 0.231

Note

Testing Environment

  • GPU: NVIDIA RTX 2080 Ti 22GB
  • Input Size: 640ร—640 pixels

Testing Tools

  • Official: Using the ONNX model exported by Ultralytics.
  • trtyolo: Using the CLI tool (trtyolo) provided by TensorRT-YOLO to export the ONNX model with the EfficientNMS plugin.
  • trtexec: Using NVIDIA's trtexec tool to build the ONNX model into an engine and perform inference testing.
    • Build Command: trtexec --onnx=xxx.onnx --saveEngine=xxx.engine --fp16
    • Test Command: trtexec --avgRuns=1000 --useSpinWait --loadEngine=xxx.engine
  • TensorRT-YOLO Inference: Using the TensorRT-YOLO framework to measure the latency (including pre-processing, inference, and post-processing) of the engine obtained through the trtyolo + trtexec method.

๐Ÿ”ฎ Documentation

๐Ÿ’จ Quick Start

1. Prerequisites

  • CUDA: Recommended version โ‰ฅ 11.0.1
  • TensorRT: Recommended version โ‰ฅ 8.6.1
  • Operating System: Linux (x86_64 or arm) (recommended); Windows is also supported

2. Installation

3. Model Export

  • Refer to the ๐Ÿ”ง Model Export documentation to export an ONNX model suitable for inference in this project and build it into a TensorRT engine.

4. Inference Example

Note

ClassifyModel, DetectModel, OBBModel, SegmentModel, and PoseModel correspond to image classification (Classify), detection (Detect), oriented bounding box (OBB), segmentation (Segment), and pose estimation (Pose) models, respectively.

  • Inference using Python:

    import cv2
    from tensorrtyolo.infer import InferOption, DetectModel, generatelabels, visualize
    
    def main():
        # -------------------- Initialization --------------------
        # Configure inference settings
        option = InferOption()
        option.enableswaprb()  # Convert OpenCV's default BGR format to RGB
        # Special model configuration example (uncomment for PP-YOLOE series)
        # option.setnormalizeparams([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    
        # -------------------- Model Initialization --------------------
        # Load TensorRT engine file (ensure the path is correct)
        # Note: Initial engine loading may take longer due to optimization
        model = DetectModel(engine_path="yolo11n-with-plugin.engine", 
                          option=option)
    
        # -------------------- Data Preprocessing --------------------
        # Load test image (add file existence check)
        inputimg = cv2.imread("testimage.jpg")
        if input_img is None:
            raise FileNotFoundError("Failed to load test image. Check the file path.")
    
        # -------------------- Inference Execution --------------------
        # Perform object detection (returns bounding boxes, confidence scores, and class labels)
        detectionresult = model.predict(inputimg)
        print(f"==> Detection Result: {detection_result}")
    
        # -------------------- Result Visualization --------------------
        # Load class labels (ensure labels.txt matches the model)
        classlabels = generate_labels(labelsfile="labels.txt")
        # Generate visualized result
        visualized_img = visualize(
            image=input_img,
            result=detection_result,
            labels=class_labels,
        )
        cv2.imwrite("visimage.jpg", visualizedimg)
    
        # -------------------- Model Cloning Demo --------------------
        # Clone model instance (for multi-threaded scenarios)
        cloned_model = model.clone()  # Create an independent copy to avoid resource contention
        # Verify cloned model inference consistency
        clonedresult = cloned_model.predict(inputimg)
        print(f"==> Cloned Result: {cloned_result}")
    
    if _name__ == "__main_":
        main()
  • Inference using C++:

    #include <memory>
    #include <opencv2/opencv.hpp>
    
    // For ease of use, the module uses only CUDA and TensorRT, with the rest implemented in standard libraries
    #include "deploy/model.hpp"  // Contains model inference class definitions
    #include "deploy/option.hpp"  // Contains inference option configuration class definitions
    #include "deploy/result.hpp"  // Contains inference result definitions
    
    int main() {
        try {
            // -------------------- Initialization --------------------
            deploy::InferOption option;
            option.enableSwapRB();  // BGR->RGB conversion
            
            // Special model parameter setup example
            // const std::vector<float> mean{0.485f, 0.456f, 0.406f};
            // const std::vector<float> std{0.229f, 0.224f, 0.225f};
            // option.setNormalizeParams(mean, std);
    
            // -------------------- Model Initialization --------------------
            auto detector = std::make_unique<deploy::DetectModel>(
                "yolo11n-with-plugin.engine",  // Model path
                option                         // Inference settings
            );
    
            // -------------------- Data Loading --------------------
            cv::Mat cvimage = cv::imread("testimage.jpg");
            if (cv_image.empty()) {
                throw std::runtime_error("Failed to load test image.");
            }
            
            // Encapsulate image data (no pixel data copying)
            deploy::Image input_image(
                cv_image.data,     // Pixel data pointer
                cv_image.cols,     // Image width
                cv_image.rows,     // Image height
            );
    
            // -------------------- Inference Execution --------------------
            deploy::DetResult result = detector->predict(input_image);
            std::cout << result << std::endl;
    
            // -------------------- Result Visualization (Example) --------------------
            // Implement visualization logic in actual development, e.g.:
            // cv::Mat visimage = visualize_detections(cvimage, result);
            // cv::imwrite("visresult.jpg", visimage);
    
            // -------------------- Model Cloning Demo --------------------
            auto cloned_detector = detector->clone();  // Create an independent instance
            deploy::DetResult clonedresult = cloned_detector->predict(inputimage);
    
            // Verify result consistency
            std::cout << cloned_result << std::endl;
    
        } catch (const std::exception& e) {
            std::cerr << "Program Exception: " << e.what() << std::endl;
            return EXIT_FAILURE;
        }
        return EXIT_SUCCESS;
    }

5. Inference Flowchart

Below is the flowchart of the predict method, which illustrates the complete process from input image to output result:

Simply pass the image to be inferred to the predict method. The predict method will automatically complete preprocessing, model inference, and post-processing internally, and output the inference results. These results can be further applied to downstream tasks (such as visualization, object tracking, etc.).

For more deployment examples, please refer to the Model Deployment Examples section.

๐Ÿ–ฅ๏ธ Model Support List

Detect Segment
Pose OBB

Symbol legend: (1) โœ… : Supported; (2) โ”: In progress; (3) โŽ : Not supported; (4) โŽ : Self-implemented export required for inference.

Task Scenario Model CLI Export Inference Deployment
Detect ultralytics/yolov3 โœ… โœ…
Detect ultralytics/yolov5 โœ… โœ…
Detect meituan/YOLOv6 โŽ Refer to official export tutorial โœ…
Detect WongKinYiu/yolov7 โŽ Refer to official export tutorial โœ…
Detect WongKinYiu/yolov9 โŽ Refer to official export tutorial โœ…
Detect THU-MIG/yolov10 โœ… โœ…
Detect ultralytics/ultralytics โœ… โœ…
Detect PaddleDetection/PP-YOLOE+ โœ… โœ…
Segment ultralytics/yolov3 โœ… โœ…
Segment ultralytics/yolov5 โœ… โœ…
Segment meituan/YOLOv6-seg โŽ Implement yourself referring to tensorrt_yolo/export/head.py ๐ŸŸข
Segment WongKinYiu/yolov7 โŽ Implement yourself referring to tensorrt_yolo/export/head.py ๐ŸŸข
Segment WongKinYiu/yolov9 โŽ Implement yourself referring to tensorrt_yolo/export/head.py ๐ŸŸข
Segment ultralytics/ultralytics โœ… โœ…
Classify ultralytics/yolov3 โœ… โœ…
Classify ultralytics/yolov5 โœ… โœ…
Classify ultralytics/ultralytics โœ… โœ…
Pose ultralytics/ultralytics โœ… โœ…
OBB ultralytics/ultralytics โœ… โœ…

๐ŸŒŸ Sponsorship & Support

Open-source projects thrive on support. If this project has been helpful to you, consider sponsoring the author. Your support is the greatest motivation for continued development!


๐Ÿ™ A Heartfelt Thank You to Our Supporters and Sponsors:

Note

The following is a list of sponsors automatically generated by GitHub Actions, updated daily โœจ.

๐Ÿ“„ License

TensorRT-YOLO is licensed under the GPL-3.0 License, an OSI-approved open-source license that is ideal for students and enthusiasts, fostering open collaboration and knowledge sharing. Please refer to the LICENSE file for more details.

Thank you for choosing TensorRT-YOLO; we encourage open collaboration and knowledge sharing, and we hope you comply with the relevant provisions of the open-source license.

๐Ÿ“ž Contact

For bug reports and feature requests regarding TensorRT-YOLO, please visit GitHub Issues!

๐Ÿ™ Thanks

Featured๏ฝœHelloGitHub