Opened 5 weeks ago

Last modified 5 weeks ago

#1245 new defect

smartd continues to write identical disk data for a failed drive

Reported by: Ulrich Owned by:
Priority: minor Milestone: undecided
Component: smartd Version:
Keywords: scsi Cc:

Description

We had a SCSI drive failure in a HP Smart Array (cciss) during a long self test. The start of the test was logged, a temperature change was logged, too, but from then on each S.M.A.R.T request failed (because the disk died during self-test).
Amazingly smart continues to write identical smart values to the CSV file, and there's no indication that the drive is actually dead.
The version of smartmontools being used is 6.6 of SLES 12 SP4.

Attachments (1)

csv.gz (837 bytes) - added by Ulrich 5 weeks ago.
Compresses extract of the CSV file

Download all attachments as: .zip

Change History (3)

comment:1 Changed 5 weeks ago by Christian Franke

Keywords: scsi added
Milestone: undecided

Please provide the related (around time of failure) syslog and CSV outputs of smartd. If the drive is still accessible, please provide also a smartd -r ioctl,2 -q onecheck output.

Changed 5 weeks ago by Ulrich

Attachment: csv.gz added

Compresses extract of the CSV file

comment:2 Changed 5 weeks ago by Ulrich

or the easier part,I'm afraid this is not the information you are after:

"smartd -r ioctl,2 -q onecheck" seems to output an endless number of zeros:

smartd 6.6 2017-11-05 r4594 [x86_64-linux-4.12.14-95.32-default] (SUSE RPM)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org 

Opened configuration file /etc/smartd.conf
Configuration file /etc/smartd.conf parsed.

===== [LUN DATA] DATA START (BASE-16) =====
000-015: 00 00 00 18 00 00 00 00 00 00 00 c0 00 00 00 01
016-031: 00 00 00 c0 00 00 01 01 00 00 00 c0 00 00 fa 01
032-047: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
048-063: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
064-079: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
080-095: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
096-111: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
112-127: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
128-143: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
144-159: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
160-175: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
176-191: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
192-207: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
208-223: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
224-239: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
240-255: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
256-271: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
272-287: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
288-303: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
304-319: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
320-335: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
336-351: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
352-367: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
368-383: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
384-399: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
400-415: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
416-431: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
432-447: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
448-463: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
464-479: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
480-495: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
496-511: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
512-527: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
528-543: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
544-559: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
560-575: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...
7184-7199: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7200-7215: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7216-7231: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7232-7247: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7248-7263: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7264-7279: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7280-7295: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7296-7311: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7312-7327: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7328-7343: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7344-7359: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...

Note on the CSV: Failure occurred on 2019-10-01 between 5 and 6 'o clock. After a reboot of the server on 2019-10-08 in the morning, the disk became "alive" again.

Syslog:

2019-10-01T05:48:25.750970+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], self-test in progress
2019-10-01T05:48:25.751243+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], Temperature changed +2 Celsius to 28 Celsius (Min/Max 22/29)
2019-10-01T06:18:25.793639+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], failed to read SMART values
2019-10-01T06:18:25.793958+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], failed to read Temperature
2019-10-01T06:48:26.020312+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], failed to read Temperature

# after reboot

2019-10-08T09:57:26.433922+02:00 h06 smartd[2932]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], initial Temperature is 24 Celsius (Min/Max 22/29)

Note: See TracTickets for help on using tickets.