Opened 3 years ago
Closed 22 months ago
#1245 closed defect (worksforme)
smartd continues to write identical disk data for a failed drive
Reported by: | Ulrich | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | smartd | Version: | |
Keywords: | scsi | Cc: |
Description
We had a SCSI drive failure in a HP Smart Array (cciss) during a long self test. The start of the test was logged, a temperature change was logged, too, but from then on each S.M.A.R.T request failed (because the disk died during self-test).
Amazingly smart continues to write identical smart values to the CSV file, and there's no indication that the drive is actually dead.
The version of smartmontools being used is 6.6 of SLES 12 SP4.
Attachments (1)
Change History (4)
comment:1 Changed 3 years ago by
Keywords: | scsi added |
---|---|
Milestone: | → undecided |
comment:2 Changed 3 years ago by
or the easier part,I'm afraid this is not the information you are after:
"smartd -r ioctl,2 -q onecheck" seems to output an endless number of zeros:
smartd 6.6 2017-11-05 r4594 [x86_64-linux-4.12.14-95.32-default] (SUSE RPM) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org Opened configuration file /etc/smartd.conf Configuration file /etc/smartd.conf parsed. ===== [LUN DATA] DATA START (BASE-16) ===== 000-015: 00 00 00 18 00 00 00 00 00 00 00 c0 00 00 00 01 016-031: 00 00 00 c0 00 00 01 01 00 00 00 c0 00 00 fa 01 032-047: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 048-063: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 064-079: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 080-095: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 096-111: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 112-127: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 128-143: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 144-159: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 160-175: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 176-191: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 192-207: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 208-223: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 224-239: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 240-255: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 256-271: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 272-287: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 288-303: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 304-319: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 320-335: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 336-351: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 352-367: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 368-383: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 384-399: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 400-415: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 416-431: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 432-447: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 448-463: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 464-479: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 480-495: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 496-511: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 512-527: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 528-543: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 544-559: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 560-575: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 7184-7199: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7200-7215: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7216-7231: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7232-7247: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7248-7263: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7264-7279: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7280-7295: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7296-7311: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7312-7327: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7328-7343: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7344-7359: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...
Note on the CSV: Failure occurred on 2019-10-01 between 5 and 6 'o clock. After a reboot of the server on 2019-10-08 in the morning, the disk became "alive" again.
Syslog:
2019-10-01T05:48:25.750970+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], self-test in progress 2019-10-01T05:48:25.751243+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], Temperature changed +2 Celsius to 28 Celsius (Min/Max 22/29) 2019-10-01T06:18:25.793639+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], failed to read SMART values 2019-10-01T06:18:25.793958+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], failed to read Temperature 2019-10-01T06:48:26.020312+02:00 h06 smartd[3212]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], failed to read Temperature
# after reboot
2019-10-08T09:57:26.433922+02:00 h06 smartd[2932]: Device: /dev/cciss/c0d0 [cciss_disk_01] [SCSI], initial Temperature is 24 Celsius (Min/Max 22/29)
comment:3 Changed 22 months ago by
Milestone: | undecided |
---|---|
Resolution: | → worksforme |
Status: | new → closed |
Root of the problem is unknown. Problem could not be reproduced.
Please provide the related (around time of failure) syslog and CSV outputs of smartd. If the drive is still accessible, please provide also a
smartd -r ioctl,2 -q onecheck
output.