Opened 5 years ago

Closed 19 months ago

Last modified 18 months ago

#1153 closed defect (wontfix)

Command timeout occurred when I used the command "smartctl -C -t short" on HDD test

Reported by: jerrytw168 Owned by:
Priority: critical Milestone:
Component: smartctl Version: 6.6
Keywords: Cc: linjerrytw@…

Description

Hi,

When I used this command "smartctl -c -t short /dev/sdb" to verify SSD, smartctl (using smartctl -a)test result would show "Interrupt (host reset)" as following.

# 6 Short captive Interrupted (host reset) 70% 2423 -
# 7 Short captive Interrupted (host reset) 70% 2408 -

And /dev/log/dmesg also occurred some error messages below.
However, when I removed -C (captive mode), these issues would disappeared. I tried lots of SSDs (Intel, Samsung, HGST), I got the same symptom.

Would you please advise if the parameter "-C" can't use with "-t" in the same test? Or is it a bug for smartctl tool? I will be grateful for any help you can provide.

/var/log/dmesg
=================================================================
[166867.098164] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[166867.098172] ata2.00: failed command: SMART
[166867.098180] ata2.00: cmd b0/d4:00:81:4f:c2/00:00:00:00:00/00 tag 25

res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

[166867.098184] ata2.00: status: { DRDY }
[166867.098189] ata2: hard resetting link
[166867.403151] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[166867.403678] ata2.00: supports DRM functions and may not be fully accessible
[166867.404715] ata2.00: supports DRM functions and may not be fully accessible
[166867.405200] ata2.00: configured for UDMA/133
[166867.405218] ata2: EH complete

Change History (7)

comment:1 by Christian Franke, 5 years ago

The kernel log shows that the SMART command which runs the captive test was aborted by the driver with "timeout". Then the driver resets the device. The device reset aborts the running self test. This is then recorded as "host reset" in the self-test log.

The problem is that smartctl does not pass a sufficient long command timeout to the driver in this case. Some drivers don't even support long timeouts.

Why do you need captive tests?

PS: For future submission, please do not set a milestone.

comment:2 by jerrytw168, 5 years ago

Thanks for your prompt reply.

As for your question "Why do you need captive tests?"?
According to the description of "SMART RUN/ABORT OFFLINE TEST AND self-test OPTIONS", '-C' option can be used in conjuction with short or long self-test. That's why I use in captive mode for the testing.

To be honest, I don't understand what test purpose of captive mode is. If possible, could you explain more when I just need to use SSD self-test in captive mode.

Thanks for your help.

comment:3 by Christian Franke, 5 years ago

Summary: Some issues occurred when I used the command "smartctl -C -t short" on HDD testCommand timeout occurred when I used the command "smartctl -C -t short" on HDD test

There is no need to use the captive mode. I never use it.

  • Off-line (Background) test: The test command returns immediately and the test itself continues in background. The drive is accessible during the test (see also the FAQ).
  • Captive (Foreground) test: The test command waits until the test has finished. The drive is not accessible during the test. Captive tests are aborted if the device driver times out the command and resets the link. Therefore smartctl -C -t ... should set a sufficiently long timeout when issuing the test command. This is not the case for ATA/SATA devices.

Leaving ticket open as we should either fix the timeout setting or remove the -C option.

comment:4 by Alex Samorukov, 4 years ago

Christian, i think this timeout comes not from smartmontools, but from system itself, if drive is mounted and in use. We can potentially warn user that captive test is dangerous and will put device offline for some time and to require --force for it.

comment:5 by Ch.Ris, 3 years ago

May this breakage be avoided if smartctl would temporarily increase some timeouts
as set, e.g. in /sys/block/${disk}/device/timeout on linux.

The drive is not accessible during the [captive "-C"] test.

This is quite a limitation.
So, for me the keep-awake solution for reliable offline (background) selftests from https://www.smartmontools.org/ticket/1443 seems preferable.

comment:6 by Alex Samorukov, 19 months ago

Resolution: wontfix
Status: newclosed

Workaround provided in the ticket

comment:7 by Christian Franke, 18 months ago

Milestone: undecided
Note: See TracTickets for help on using tickets.