Opened 16 months ago

Closed 16 months ago

Last modified 9 months ago

#1017 closed enhancement (duplicate)

make smartd usable as temperature monitor / replace hddtemp

Reported by: calestyo Owned by:
Priority: major Milestone:
Component: smartd Version: 6.6
Keywords: Cc: Nathan Stratton Treadway

Description (last modified by Christian Franke)

Hi.

It seems to me that hddtemp, is more or less dead and unmaintained... and since (AFAIK) its temperature reading is also based on SMART, there is not much sense in having both, smartd and hddtemp.

For many of my newer devices like SSDs, hddtemp reports "no temp sensor found" or so... while smartmontools work perfectly on them (and display temperature).

smartd already seems to have some limited functionality to monitor a device's temperature, namely via:
-W DIFF[,INFO[,CRIT]]

There are a number of problems with it:

1) Most importantly, it seems that warnings are not re-sent as they occur (but only once a day?).
For example, I have had a line in smartd.conf like:
/dev/disk/by-id/ata-Samsung_SSD_850_PRO_1TB_S252NXAG910017F -d auto -d removable -n standby,4 -a -W 0,50,55 -m root -M exec /usr/share/smartmo
ntools/smartd-runner
For testing purposes I changed that to -W 0,20,25 and got an alert. Changed it back to -W 0,50,55 (which is fine for that device) and restarted... and then I repeated this (i.e. going back to something that should trigger a warning).
However, no further warning.

This behaviour may be reasonable for other smart values, e.g. things like:

  • Wear_Leveling_Count
  • Uncorrectable_Error_Cnt

would typically get only worse and not better again.
And things like:

  • ECC_Error_Rate

may increase pretty fast on some devices (one such value does on Seagate) and it's perfectly fine for them.

But for temperature monitoring it's IMHO bad:
My Samsung SSD for example, supports I think up to 70°C.
So I'd like to get a warning at say 50°C ... but not only the first time per day, because the temperature may decrease again then (or I just decrease the IO load on the device)... only to rise again shortly after (which I wouldn't notice anymore, as no further warning is sent).

Especially on mobile devices like laptops, temperatures can easily go up and down quite regularly.
Therefore it makes sense to send temperature errors every time they occur (i.e. that is once per check interval).

2) devices typically also have a minimum operation temperature
This is typically pretty low, so I'm not sure if it's can be even monitored properly (=> do the temp sensors of the disks give reasonable values for such low temps?)... but if they can, it would be nice if smartd would also monitor for a minimum temperature.

3) smartmontools should know the max[/min] temperatures of the devices
*if* smartd would become a replacement / alternative to hddtemp, it would of course be nice if it comes with a database of max[/min] temperatures for known devices.
Example, my Samsung SSD (according to Samsung) operate in some range between 0-70°C. My HDDs take much less (~50°C or so? would need to look it up).
So it would be nice, if there'd be a DB, that automatically selects reasonable values, like for the SSD in my case: INFO at 60°C, CRIT at 70°C

Cheers,
Chris.

Change History (5)

comment:1 Changed 16 months ago by Christian Franke

Milestone: undecided

comment:2 Changed 16 months ago by Christian Franke

Description: modified (diff)

comment:3 Changed 16 months ago by Christian Franke

1) Most importantly, it seems that warnings are not re-sent as they occur (but only once a day?).
For example, ...

This is already addressed in ticket #1018.

2) devices typically also have a minimum operation temperature
This is typically pretty low, so I'm not sure if it's can be even monitored properly (=> do the temp sensors of the disks give reasonable values for such low temps?)

At least negative temperatures may not work (see ticket #291 for a rare use case).

... but if they can, it would be nice if smartd would also monitor for a minimum temperature.

May make sense in some use cases. Please create a separate ticket.

3) smartmontools should know the max[/min] temperatures of the devices
*if* smartd would become a replacement / alternative to hddtemp, it would of course be nice if it comes with a database of max[/min] temperatures for known devices.

This is unrelated to hddtemp. Its hddtemp.db file does not contain any temperature limits. Please create a separate ticket.

comment:4 Changed 16 months ago by Christian Franke

Milestone: undecided
Resolution: duplicate
Status: newclosed

For 1), see ticket #1018. For 2) and 3), please create separate tickets.

comment:5 Changed 9 months ago by Nathan Stratton Treadway

Cc: Nathan Stratton Treadway added
Note: See TracTickets for help on using tickets.