Opened 10 months ago

Closed 2 weeks ago

#1245 closed defect (worksforme)

smartd continues to write identical disk data for a failed drive

Reported by: Ulrich Owned by:
Priority: minor Milestone:
Component: smartd Version:
Keywords: scsi Cc:

Description

We had a SCSI drive failure in a HP Smart Array (cciss) during a long self test. The start of the test was logged, a temperature change was logged, too, but from then on each S.M.A.R.T request failed (because the disk died during self-test).
Amazingly smart continues to write identical smart values to the CSV file, and there's no indication that the drive is actually dead.
The version of smartmontools being used is 6.6 of SLES 12 SP4.

Attachments (1)

csv.gz (837 bytes) - added by Ulrich 10 months ago.
Compresses extract of the CSV file

Download all attachments as: .zip

Change History (4)

comment:1 Changed 10 months ago by Christian Franke

Keywords: scsi added
Milestone: undecided

Please provide the related (around time of failure) syslog and CSV outputs of smartd. If the drive is still accessible, please provide also a smartd -r ioctl,2 -q onecheck output.

Changed 10 months ago by Ulrich

Attachment: csv.gz added

Compresses extract of the CSV file

comment:2 Changed 10 months ago by Ulrich

or the easier part,I'm afraid this is not the information you are after:

"smartd -r ioctl,2 -q onecheck" seems to output an endless number of zeros:

smartd 6.6 2017-11-05 r4594 [x86_64-linux-4.12.14-95.32-default] (SUSE RPM)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org 

Opened configuration file /etc/smartd.conf
Configuration file /etc/smartd.conf parsed.

===== [LUN DATA] DATA START (BASE-16) =====
000-015: 00 00 00 18 00 00 00 00 00 00 00 c0 00 00 00 01
016-031: 00 00 00 c0 00 00 01 01 00 00 00 c0 00 00 fa 01
032-047: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
048-063: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
064-079: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
080-095: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
096-111: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
112-127: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
128-143: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
144-159: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
160-175: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
176-191: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
192-207: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
208-223: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
224-239: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
240-255: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
256-271: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
272-287: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
288-303: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
304-319: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
320-335: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
336-351: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
352-367: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
368-383: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
384-399: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
400-415: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
416-431: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
432-447: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
448-463: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
464-479: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
480-495: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
496-511: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
512-527: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
528-543: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
544-559: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
560-575: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...
7184-7199: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7200-7215: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7216-7231: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7232-7247: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7248-7263: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7264-7279: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7280-7295: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7296-7311: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7312-7327: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7328-7343: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7344-7359: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...

Note on the CSV: Failure occurred on 2019-10-01 between 5 and 6 'o clock. After a reboot of the server on 2019-10-08 in the morning, the disk became "alive" again.

Syslog:

2019-10-01T05:48:25.750970+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], self-test in progress
2019-10-01T05:48:25.751243+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], Temperature changed +2 Celsius to 28 Celsius (Min/Max 22/29)
2019-10-01T06:18:25.793639+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], failed to read SMART values
2019-10-01T06:18:25.793958+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], failed to read Temperature
2019-10-01T06:48:26.020312+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], failed to read Temperature

# after reboot

2019-10-08T09:57:26.433922+02:00 h06 smartd[2932]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], initial Temperature is 24 Celsius (Min/Max 22/29)

comment:3 Changed 2 weeks ago by Christian Franke

Milestone: undecided
Resolution: worksforme
Status: newclosed

Root of the problem is unknown. Problem could not be reproduced.

Note: See TracTickets for help on using tickets.