Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add computer_hw package (copied, improved from pr2_computer_monitor) #21

Open
wants to merge 18 commits into
base: kinetic
Choose a base branch
from

Conversation

130s
Copy link
Member

@130s 130s commented Feb 24, 2022

The change in this PR is the same as #20 but re-opened from a different branch.

Issue aimed at

Changes

  • Add computer_hw package (renamed pr2_computer_monitor that was copied from pr2_robot repo)
  • Added a .launch to allow downstream to start processes by batch.

Review items

Test

Dev test done on Ubuntu 16.04 host with nvidia GeForce GTX 1060
# roslaunch computer_hw monitor.launch                                                                                                                                                                                                                                                                      
... logging to /root/.ros/log/1b44b418-1846-11ec-b2b0-c400ad2d8cb0/roslaunch-rabbitdeer-3380.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://rabbitdeer:38343/

SUMMARY
========

PARAMETERS
 * /rosdistro: kinetic
 * /rosversion: 1.12.13

NODES
  /
    diag_agg (diagnostic_aggregator/aggregator_node)
    libsensors_monitor (libsensors_monitor/libsensors_monitor)
    nvidia_temperature_monitor (computer_hw/nvidia_temp.py)

auto-starting new master
process[master]: started with pid [3390]
ROS_MASTER_URI=http://localhost:11311

setting /run_id to 1b44b418-1846-11ec-b2b0-c400ad2d8cb0
process[rosout-1]: started with pid [3403]
started core service [/rosout]
process[libsensors_monitor-2]: started with pid [3410]
[ INFO] [1631944994.889260052]: Got system hostname: rabbitdeer
[ INFO] [1631944994.896585316]: Found sensor coretemp-isa-0000 with features: temp1, temp2, temp3, temp4, temp5
[ INFO] [1631944994.896702034]: Found sensor acpitz-virtual-0 with features: temp1, temp2, temp3
[ INFO] [1631944994.896749535]: Found sensor pch_skylake-virtual-0 with features: temp1
process[nvidia_temperature_monitor-3]: started with pid [3421]
[INFO] [1631944995.775560]: card_out: 
==============NVSMI LOG==============

Timestamp                           : Sat Sep 18 06:03:15 2021
Driver Version                      : 440.64
CUDA Version                        : 10.2

Attached GPUs                       : 1
GPU 00000000:01:00.0
    Product Name                    : GeForce GTX 1060 6GB
    Product Brand                   : GeForce
    Display Mode                    : Enabled
    Display Active                  : Enabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-7f9b4a72-68fe-e2a9-8907-4590704d3431
    Minor Number                    : 0
    VBIOS Version                   : 86.06.45.00.60
    MultiGPU Board                  : No
    Board ID                        : 0x100
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : G001.0000.01.04
        OEM Object                  : 1.1
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x01
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1C0310DE
        Bus Id                      : 00000000:01:00.0
        Sub System Id               : 0x61633842
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 0 KB/s
        Rx Throughput               : 0 KB/s
    Fan Speed                       : 5 %
    Performance State               : P8
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 6077 MiB
        Used                        : 114 MiB
        Free                        : 5963 MiB
    BAR1 Memory Usage
       Total                       : 256 MiB                                                                                                                                                                                                                                                                        [62/1811]
        Used                        : 5 MiB
        Free                        : 251 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 2 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
        Aggregate
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending Page Blacklist      : N/A
    Temperature
        GPU Current Temp            : 51 C
        GPU Shutdown Temp           : 102 C
        GPU Slowdown Temp           : 99 C
        GPU Max Operating Temp      : N/A
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 5.91 W
        Power Limit                 : 120.00 W
        Default Power Limit         : 120.00 W
        Enforced Power Limit        : 120.00 W
        Min Power Limit             : 60.00 W
        Max Power Limit             : 140.00 W
   Clocks
        Graphics                    : 139 MHz
        SM                          : 139 MHz
        Memory                      : 405 MHz
        Video                       : 544 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 2012 MHz
        SM                          : 2012 MHz
        Memory                      : 4004 MHz
        Video                       : 1708 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes


gpu_stat: header: 
  seq: 0
  stamp: 
    secs: 0
    nsecs:         0
  frame_id: ''
product_name: "GeForce GTX 1060 6GB"
pci_device_id: ''
pci_location: ''
display: ''
driver_version: "440.64"
temperature: 51
fan_speed: 23.5619449019
gpu_usage: 0
memory_usage: 2

process[diag_agg-4]: started with pid [3435]
[ERROR] [1631944995.896812050]: No analyzers initialized in AnalyzerGroup /diag_agg/analyzers
[ERROR] [1631944995.896856468]: Analyzer group for diagnostic aggregator failed to initialize!
^C[diag_agg-4] killing on exit
[nvidia_temperature_monitor-3] killing on exit
[libsensors_monitor-2] killing on exit
[INFO] [1631944996.825916]: card_out: 
gpu_stat: header: 
  seq: 0
  stamp: 
    secs: 0
    nsecs:         0
  frame_id: ''
product_name: ''
pci_device_id: ''
pci_location: ''
display: ''
driver_version: ''
temperature: 0.0
fan_speed: 0.0
gpu_usage: 0.0
memory_usage: 0.0

:

Sample of Diagnostic GUI with GPU monitoring output.

@130s 130s changed the title WIP: (re-opening) Add computer_hw package (copied, improved from pr2_computer_monitor) WIP: Add computer_hw package (copied, improved from pr2_computer_monitor) Mar 12, 2022
@130s 130s force-pushed the feature-computer-monitor branch 3 times, most recently from 5361360 to f06bfe4 Compare February 22, 2023 12:46
@130s 130s force-pushed the feature-computer-monitor branch 7 times, most recently from ae82cde to 87b00b4 Compare February 24, 2023 20:44
@130s 130s force-pushed the feature-computer-monitor branch from 87b00b4 to deb5cec Compare February 24, 2023 20:52
@130s
Copy link
Member Author

130s commented Feb 24, 2023

Unit test is failing and I think this is a bug in the unit test case.

https://github.com/kinu-garage/linux_peripheral_interfaces/actions/runs/4266170296/jobs/7426371093#step:4:467

  ======================================================================
  FAIL: test_parse (parse_test.TestNominalParser)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "/root/target_ws/src/linux_peripheral_interfaces/computer_hw/test/parse_test.py", line 70, in test_parse
      self.assert_(gpu_stat.pci_device_id, "No PCI Device ID found")
  AssertionError: '' is not true : No PCI Device ID found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant