Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#303 closed defect (wontfix)

In smart test captive mode, extend the timeout as described by the ATA device

Reported by: gwendal1 Owned by: Christian Franke
Priority: minor Milestone:
Component: smartctl Version: 5.42
Keywords: Cc:

Description

When we use smartctl -C -t long /dev/sdX, the ATA SMART command we send has the usual 20s timeout.
This is not enough, the drive usually needs several minutes for the test to complete.

On the command line:
smartctl -C -t long /dev/sda

smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.8.11] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in captive mode".
Drive command "Execute SMART Extended self-test routine immediately in captive mode" successful.
Testing has begun.
Please wait 10 minutes for test to complete.
Test will complete after Tue Oct 22 16:44:49 2013

In /var/log/messages, with SCSI debug log enabled:

 [  122.121623] sd 0:0:0:0: [sda] sd_ioctl: disk=sda, cmd=0x2285
 [  122.121640] scsi_block_when_processing_errors: rtn: 1
 [  122.121655] sd 0:0:0:0: [sda] Send:
 [  122.121662] 0xffff88015c86d300
 [  122.121672] sd 0:0:0:0: [sda] CDB:
 [  122.121679] ATA command pass through(16): 85 06 0c 00 d4 00 00 00 82 00 4f 00 c2 00 b0 00
 [  122.121772] buffer = 0x          (null), bufflen = 0, queuecommand 0xffffffff9df1d70c
 [  122.121785] leaving scsi_dispatch_cmnd()
 [  142.735081] sd 0:0:0:0: [sda] Done:
 [  142.735102] 0xffff88015c86d300 TIMEOUT 
 [  142.735121] sd 0:0:0:0: [sda]
 [  142.735134] Result: hostbyte=DID_OK driverbyte=DRIVER_OK
 [  142.735150] sd 0:0:0:0: [sda] CDB:
 [  142.735162] ATA command pass through(16): 85 06 0c 00 d4 00 00 00 82 00 4f 00 c2 00 b0 00
 [  142.735267] sd 0:0:0:0: [sda] scsi host busy 1 failed 0
 [  142.735287] Waking error handler thread
 [  142.735329] Error handler scsi_eh_0 waking up
 [  142.735365] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
 [  142.735386] ata1.00: failed command: SMART
 [  142.735407] ata1.00: cmd b0/d4:00:82:4f:c2/00:00:00:00:00/00 tag 0
 [  142.735407]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
 [  142.735436] ata1.00: status: { DRDY }
 [  142.735459] ata1: hard resetting link

The command times out after 20s. Instead, the command should have a 10 + x minutes timeout, to be sure the device can complete the command before the error handler kicks in. We do know the test will last 10 minutes from SMART data information.

However, looking at the code it seems there is no way today to pass the desired timeout with an ATA passthrough command.

Change History (3)

comment:1 Changed 6 years ago by Christian Franke

Component: allsmartctl
Owner: changed from somebody to Christian Franke
Priority: majorminor
Status: newaccepted

ATA pass-through I/O-controls are platform and controller-specific. There is no portable way to set the command timeout. Even if an I/O-control supports this parameter, the implementation may ignore it or set a command-specific timeout itself.

Why do you need the captive mode?

comment:2 Changed 6 years ago by gwendal1

Resolution: wontfix
Status: acceptedclosed

I can use the offline mode, but I just wanted to point out that captive mode will not work if the test is longer than 20s.
From your explanation, I understand this is too difficult to fix it right.

comment:3 Changed 6 years ago by Christian Franke

Thanks for the info. I will probably address this in a future release for Linux SG_IO and other frequently used I/O-controls which actually support extended timeouts.

Note: See TracTickets for help on using tickets.