Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#1538 closed defect (invalid)

smartd gets confused about which drive is reporting temperature changes?

Reported by: Jimmie Owned by:
Priority: minor Milestone:
Component: smartd Version: 7.0
Keywords: Cc:

Description

OS: OpenSUSE Leap 15.3 (x86-64)
smartmontools version 7.0-6.1

I've received a number of smartd email notifications over the past few weeks. Every time, the system journal suggests that smartd detected a temperature change on /dev/sda but that an error was logged against /dev/nvme.

My system has two SSDs:
/dev/sda: Samsung 850 EVO
/dev/nvme0: SK Hynix P31

The following warning/error was logged by the smartd daemon today:

   Device: /dev/nvme0, number of Error Log entries increased from 11 to 12
   Device info:
   SHGP31-1000GM-2, S/N:XXXXXX, FW:41060C20

The system journal from that timestamp shows the following:

smartd[1511]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 74 [Raw 26] to 73 [Raw 27]
smartd[1511]: Device: /dev/nvme0, number of Error Log entries increased from 11 to 12
smartd[1511]: Sending warning via <mail> to xxxxxxxx@xxxxxxxx ...
smartd[1511]: Warning via <mail> to xxxxxx@xxxxxxxx: successful

For some reason, it looks like smartd logged the error against /dev/nvme0?

Change History (5)

comment:1 by Jimmie, 3 years ago

Component: allsmartd

comment:2 by Christian Franke, 3 years ago

Milestone: undecided

The ATA and NVMe log entries are unrelated:

smartd[1511]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 74 [Raw 26] to 73 [Raw 27]

By default, each change of this ATA attribute is reported. This never results in a warning email. Use something like -I 190 -W 0,0,50 to receive over-temperature warning emails only.

smartd[1511]: Device: /dev/nvme0, number of Error Log entries increased from 11 to 12
smartd[1511]: Sending warning via <mail> to xxxxxxxx@xxxxxxxx ...

An increase of the NVMe Error Information Log Entries value of SMART/Health Information results in a warning email. Some SSDs apparently increase this value without adding actual errors to the error log.

comment:3 by Jimmie, 3 years ago

Thanks for the clarification.

It just seems odd that they tend to accompany each other. This machine's journal only goes back a month but in that timeframe there are 3 such incidents where both a temperature change was detected on /dev/sda and an error was logged on /dev/nvme0.

I'll withdraw the ticket (assuming I can)...

comment:4 by Jimmie, 3 years ago

Resolution: invalid
Status: newclosed

comment:5 by Christian Franke, 3 years ago

Milestone: undecided
Note: See TracTickets for help on using tickets.