Opened 6 months ago

Last modified 6 months ago

#1153 new defect

Command timeout occurred when I used the command "smartctl -C -t short" on HDD test

Reported by: jerrytw168 Owned by:
Priority: critical Milestone: undecided
Component: smartctl Version: 6.6
Keywords: Cc: linjerrytw@…

Description

Hi,

When I used this command "smartctl -c -t short /dev/sdb" to verify SSD, smartctl (using smartctl -a)test result would show "Interrupt (host reset)" as following.

# 6 Short captive Interrupted (host reset) 70% 2423 -
# 7 Short captive Interrupted (host reset) 70% 2408 -

And /dev/log/dmesg also occurred some error messages below.
However, when I removed -C (captive mode), these issues would disappeared. I tried lots of SSDs (Intel, Samsung, HGST), I got the same symptom.

Would you please advise if the parameter "-C" can't use with "-t" in the same test? Or is it a bug for smartctl tool? I will be grateful for any help you can provide.

/var/log/dmesg
=================================================================
[166867.098164] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[166867.098172] ata2.00: failed command: SMART
[166867.098180] ata2.00: cmd b0/d4:00:81:4f:c2/00:00:00:00:00/00 tag 25

res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

[166867.098184] ata2.00: status: { DRDY }
[166867.098189] ata2: hard resetting link
[166867.403151] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[166867.403678] ata2.00: supports DRM functions and may not be fully accessible
[166867.404715] ata2.00: supports DRM functions and may not be fully accessible
[166867.405200] ata2.00: configured for UDMA/133
[166867.405218] ata2: EH complete

Change History (3)

comment:1 Changed 6 months ago by Christian Franke

The kernel log shows that the SMART command which runs the captive test was aborted by the driver with "timeout". Then the driver resets the device. The device reset aborts the running self test. This is then recorded as "host reset" in the self-test log.

The problem is that smartctl does not pass a sufficient long command timeout to the driver in this case. Some drivers don't even support long timeouts.

Why do you need captive tests?

PS: For future submission, please do not set a milestone.

comment:2 Changed 6 months ago by jerrytw168

Thanks for your prompt reply.

As for your question "Why do you need captive tests?"?
According to the description of "SMART RUN/ABORT OFFLINE TEST AND self-test OPTIONS", '-C' option can be used in conjuction with short or long self-test. That's why I use in captive mode for the testing.

To be honest, I don't understand what test purpose of captive mode is. If possible, could you explain more when I just need to use SSD self-test in captive mode.

Thanks for your help.

comment:3 Changed 6 months ago by Christian Franke

Summary: Some issues occurred when I used the command "smartctl -C -t short" on HDD testCommand timeout occurred when I used the command "smartctl -C -t short" on HDD test

There is no need to use the captive mode. I never use it.

  • Off-line (Background) test: The test command returns immediately and the test itself continues in background. The drive is accessible during the test (see also the FAQ).
  • Captive (Foreground) test: The test command waits until the test has finished. The drive is not accessible during the test. Captive tests are aborted if the device driver times out the command and resets the link. Therefore smartctl -C -t ... should set a sufficiently long timeout when issuing the test command. This is not the case for ATA/SATA devices.

Leaving ticket open as we should either fix the timeout setting or remove the -C option.

Note: See TracTickets for help on using tickets.