Opened 3 months ago
Last modified 3 months ago
#1850 new enhancement
Ignore specific NVME temperature sensor
Reported by: | Matalonder | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | undecided |
Component: | smartd | Version: | |
Keywords: | nvme | Cc: |
Description
I have a Kingston Fury Renegade NVMe SSD, SFYRDK4000G
. It reports two temperature sensors:
SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 62 Celsius ... Temperature Sensor 2: 67 Celsius
and as sensors
output:
nvme-pci-0100 Adapter: PCI adapter Composite: +61.9°C (low = -20.1°C, high = +83.8°C) (crit = +88.8°C) Sensor 2: +66.8°C
The problem is, only Composite
is an actual temperature sensor. Sensor 2
seems to be just a "Highest temperature ever seen" tracking value. It's always 66.8, even when Composite is, like, 25.
I track this drive with -W 5,55,65
, because I want to get desktop notifications when it goes over 65, and I figured out that passing the notification-creating script to -M
works well enough.
This, however, now causes me to get the notification on every boot, because Sensor 2
is stuck at the highest-ever-seen 66.8 and smartd always uses its value:
Jun 30 15:38:29 hostname smartd[18016]: Device: /dev/disk/by-id/nvme-KINGSTON_SFYRDK4000G_..., Temperature 67 Celsius reached critical limit of 65 Celsius (Min/Max 67/67)
Effectively making the whole -W
flag useless.
So it seems like this behaviour, described in the man page, is messing with me:
For NVMe devices, smartd checks the maximum of the Composite Temperature value and all Temperature Sensor values reported by SMART/Health Information log.
Is there a way to instruct smartd
to ignore certain temperature sensor values, or use only the Composite one?
If there isn't, could you consider this enhancement? It seems like a valid use case with no other solution. For now I'll have to pass -W 0,0,0
for this SSD to avoid useless notifications and monitor it manually.
Change History (3)
comment:1 by , 3 months ago
Keywords: | nvme added |
---|---|
Milestone: | → undecided |
comment:2 by , 3 months ago
Thank you for the quick answer!
Note that an over-temperature event should be reported by bit 1 of the Critical Warning byte which is checked if -H is set.
Is the temperature level used for that set in device firmware, or can be customized?
I kind of don't trust the device in this. It's spec says "max work temp" is 70°, but it seems to report it's happy with up to 85° (which is "max storage temp" by spec). And I'd like to have an earlier warning, anyway, which is why I set it to 65°.
But it's good to know I'll get a warning if it decides to fry itself, even without -W!
comment:3 by , 3 months ago
Is the temperature level used for that set in device firmware, or can be customized?
The current threshold for the composite temperature is reported by smartctl -c
as:
Warning Comp. Temp. Threshold: 85 Celsius
According to NVMe Base Specification 2c, a drive may support customization of thresholds for both composite temperature and individual sensors via the NVMe command Get/Set Features 0x04. This is not yet supported by smartctl
. The Linux tool nvme-set-feature
should support this for example.
Sorry, no. I don't remember any similar report in the 8+ years since the first NVMe capable version of smartmontools (6.5, May 2016).
Will be decided later. Always using the composite temperature only would be a more easy solution.
Note that an over-temperature event should be reported by bit 1 of the
Critical Warning
byte which is checked if-H
is set.