-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hddtemp_smartctl: configure warning and critical temps per device #1560
Open
ap-wtioit
wants to merge
1
commit into
munin-monitoring:master
Choose a base branch
from
ap-wtioit:master-hddtemp_smartctl_config
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,16 +15,20 @@ the harddrive devices. | |
|
||
The following environment variables are used | ||
|
||
smartctl - path to smartctl executable | ||
drives - List drives to monitor. E.g. "env.drives hda hdc". | ||
type_$dev - device type for one drive, e.g. "env.type_sda 3ware,0" | ||
or more typically "env.type_sda ata" if sda is a SATA disk. | ||
args_$dev - additional arguments to smartctl for one drive, | ||
e.g. "env.args_hda -v 194,10xCelsius". Use this to make | ||
the plugin use the --all or -a option if your disk will | ||
not return its temperature when only the -A option is | ||
used. | ||
dev_$dev - monitoring device for one drive, e.g. twe0 | ||
smartctl - path to smartctl executable | ||
drives - List drives to monitor. E.g. "env.drives hda hdc". | ||
type_$dev - device type for one drive, e.g. "env.type_sda 3ware,0" | ||
or more typically "env.type_sda ata" if sda is a SATA disk. | ||
args_$dev - additional arguments to smartctl for one drive, | ||
e.g. "env.args_hda -v 194,10xCelsius". Use this to make | ||
the plugin use the --all or -a option if your disk will | ||
not return its temperature when only the -A option is | ||
used. | ||
dev_$dev - monitoring device for one drive, e.g. twe0 | ||
$dev.warning - set warning temperature for $dev, default 57 (°C) | ||
e.g. "env.nvme0n1.warning 70" | ||
$dev.critical - set critical temperature for $dev, default 60 (°C), | ||
e.g. "env.nvme0n1.critical 80" | ||
|
||
If the "smartctl" environment variable is not set the plugin will | ||
search your $PATH, /usr/bin, /usr/sbin, /usr/local/bin and | ||
|
@@ -46,6 +50,9 @@ All rights reserved. | |
2016-08-27, Gabriele Pohl ([email protected]) | ||
Fix for github issue #690 | ||
|
||
2023-07-23, Andreas Perhab, WT-IO-IT GmbH | ||
enable configuring warning and critical temps | ||
|
||
=head1 LICENSE | ||
|
||
Redistribution and use in source and binary forms, with or without | ||
|
@@ -94,6 +101,11 @@ parameter. If this parameter isn't supported by your version of | |
smartctl then hdparm will be used. Note that hdparm isn't available | ||
on all platforms. | ||
|
||
For nvme disks you can get the warning and critical temperatures with the following command, that should report wctemp | ||
and cctemp: | ||
|
||
sudo nvme id-ctrl -H /dev/nvme0 | ||
|
||
=cut | ||
|
||
use File::Spec::Functions qw(splitdir); | ||
|
@@ -227,8 +239,16 @@ if (defined $ARGV[0]) { | |
my @dirs = splitdir($_); | ||
print $d . ".label " . $dirs[-1] . "\n"; | ||
print $d . ".max 100\n"; | ||
print $d . ".warning 57\n"; | ||
print $d . ".critical 60\n"; | ||
my $warning = "57"; | ||
if (defined($ENV{$d . ".warning"})) { | ||
$warning = $ENV{$d . ".warning"}; | ||
} | ||
print $d . ".warning $warning\n"; | ||
my $critical = "60"; | ||
if (defined($ENV{$d . ".critical"})) { | ||
$critical = $ENV{$d . ".critical"}; | ||
} | ||
print $d . ".critical $critical\n"; | ||
my $id = get_drive_id($_, device_for_drive($_), $use_nocheck); | ||
print $d . ".info $id\n"; | ||
} | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks. This is good.
Could you please do
my $cfn = clean_fieldname($_);
at your line 237 and use that in the place ofclean_fieldname($_)
in the rest of the patch?I'm not sure that we should introduce default temperature warning/critical levels here. The temperatures you chose are sort of sane but to narrow compared to the operating temperature of some of my disks. The first of my disks I checked has a "operating" envelope from 0 to 65 and non-operating from -40 - 70. Don't know if other disks are less or more temperature tolerant.
Having a env.warning and env.critical to use as default is entirely sane.
Munin::Plugin has a API to support this: print_thresholds, but there is no need to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update the PR later today with the suggestion of
$cfn
. I will also add the documentation how to get the warning and critical temperatures with the nvme-cli (nvme
).The warning 57 and critical 60 were not chosen by me but are already present in munin. I just kept them for backwards compatiblity.
We mainly use this to fix the warning and critical values for SSDs (which can be checked with
sudo nvme id-ctrl -H /dev/nvme0
) as the often go past the 60°C munin has now for critical temperature (which i guess was chosen for spinning disks).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to use the `$d variable already present in the current munin master (i missed that because i started the patch from the munin version in debian)