Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#608 closed defect (invalid)

Long test hanged on HGST drives

Reported by: janardhan Owned by:
Priority: major Milestone:
Component: all Version:
Keywords: scsi Cc:

Description (last modified by Christian Franke)

We have HGST drives in our server. When we triggered long test to these drives it got hang up and in smartoutput it is showing as long test is running.

Sample outputs:

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Self test in progress ...   1     NOW                 - [-   -    -]
# 2  Background long   Aborted (device reset ?)    8       0                 - [-   -    -]

When we contacted with HGST support team. They reproduced the issue and ran their own tools on the drive. They found no background tests is running but still smartoutput is showing like the test is in progress.

We already tried smartcl -X to abort the test but it throws an error.

smartctl -X /dev/sdf
smartctl 6.0 2012-10-10 r3643 [x86_64-linux-2.6.30.5-43.ami26.fc11.x86_64] (local build)
Copyright 2002-12, Bruce Allen, Christian Franke, _www.smartmontools.org 
Abort self test failed [unsupported field in scsi command]

Help us to stop this test.

Attachments (1)

sdf.txt (15.3 KB) - added by janardhan 4 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 Changed 4 years ago by Christian Franke

Description: modified (diff)
Keywords: scsi added
Milestone: undecided

Please provide output of smartctl -r ioctl,2 -a /dev/sdf as an attachment.

Changed 4 years ago by janardhan

Attachment: sdf.txt added

comment:2 Changed 4 years ago by janardhan

Please find the attachment for the output "smartctl -r ioctl,2 -a /dev/sdf"

comment:3 Changed 4 years ago by Christian Franke

The drive returns the following self-test log entry #2 (at offset 0x18...0x2b):

 ...
  Incoming data, len=404 [only first 256 bytes shown]:
 00     10 00 01 90 00 01 03 10  20 00 25 b4 ff ff ff ff
 10     ff ff ff ff 00 00 00 01 [00 02 03 10 4f 01 00 00
  NUMBER = 2 --------------------^^^^^       || || |||||
  TYPE<<1 = 2<<1 (Background long) ----------^| || |||||
  STATUS = 0xf (Self test in progress ...) ---^ || |||||
  SEGMENT = 1 ----------------------------------^^ |||||
  HOURS = 0 ---------------------------------------^^^^^ 
 20     ff ff ff ff ff ff ff ff  00 00 00 00]00 03 03 10
...

This and the other entries are properly printed by smartctl:

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -    9652                 - [-   -    -]
# 2  Background long   Self test in progress ...   1     NOW                 - [-   -    -]
# 3  Background long   Aborted (device reset ?)    8       0                 - [-   -    -]

This is probably a harmless HGST firmware bug: Entries from previously running self-tests may not be cleaned up properly if the self-test was aborted by power loss or similar.

comment:4 Changed 4 years ago by janardhan

Thanx for your response

This disk is running on server, and there is no power loss for this.
Other than powerloss is there nay possibility of aborting the self test?

comment:5 in reply to:  4 Changed 4 years ago by Alex Samorukov

Milestone: undecided
Resolution: invalid
Status: newclosed
Version: 6.0

Replying to janardhan:

Thanx for your response

This disk is running on server, and there is no power loss for this.
Other than powerloss is there nay possibility of aborting the self test?

This could be caused in case of SCSI related issues - OS may try to reset controller in this case and its typically aborts the test. Try to look on the dmesg log. I am closing this ticket, because it does not look like a smartmontools issue.

comment:6 Changed 4 years ago by janardhan

As HGST team we got confirmation that there isn o background tests is running. But in smartctl output it is showing like running now. Can you explain me why this is happening?

comment:7 in reply to:  3 Changed 4 years ago by Christian Franke

Because smartctl simply prints what the drive returns in its self-test log. This was already explained in detail, see comment 3 above.

Note that the "Self test in progress ..." entry moved from number #1 to #2 and therefore is no longer the most recent entry. This is an evidence that the #2 "Background long" self-test was actually aborted before the successful #1 "Background short" test was started before "9652" hours lifetime.

Please ask HGST team why their firmware did not change the state of this entry from "Self test in progress ..." to "Aborted" or similar.

Note: See TracTickets for help on using tickets.