Opened 13 years ago

Closed 13 years ago

#121 closed defect (worksforme)

smartd fails to report disk failure if a disk doesn't respond anymore

Reported by: kaluscha Owned by: Christian Franke
Priority: major Milestone:
Component: smartd Version: 5.39
Keywords: Cc:

Description

I had a self test running on disk hdb:

smartd: Device: /dev/hdb, self-test in progress, 10% remaining

The disk encountered problems, see /var/log/messages:

kernel: hdb: dma_timer_expiry: dma status == 0x61
kernel: hdb: DMA timeout error
kernel: hdb: dma timeout error: status=0xd0 { Busy }
kernel: ide: failed opcode was: unknown
kernel: hda: DMA disabled
kernel: hdb: DMA disabled
kernel: ide0: reset: success

There were several kernel IDE resets until the drive didn't respond anymore:

kernel: hdb: drive not ready for command

smartd wrote messages:

smartd: Device: /dev/hdb, failed to read Temperature

However, smartd had been configured to send e-mails in case of trouble (/dev/hdb -a -I 194 -W 4,40,42 -R 5 -m myamil). In this case, it failed to do so.

In my opinion this is a major problem as smartd should inform the admins that a disk is complety offline, i.,e. doesn't respond to requests on the IDE bus anymore.

Change History (3)

comment:1 by kaluscha, 13 years ago

Keywords: linux disk failure added

comment:2 by Christian Franke, 13 years ago

Keywords: linux disk failure removed
Milestone: Release 5.41
Owner: changed from somebody to Christian Franke
Status: newaccepted

comment:3 by Christian Franke, 13 years ago

Milestone: Release 5.41
Resolution: worksforme
Status: acceptedclosed

smartd sends a warning email "failed to read SMART Attribute Data" in the above situation, see smartd.cpp. No additional email "failed to read Temperature" is sent because temperature info is part of the attribute data.

If smart option -s, --savestates is used see also ticket #35.

Note: See TracTickets for help on using tickets.