Opened 3 years ago

Last modified 3 years ago

#1523 new enhancement

Schrodinger's test - Captive tests fail when Smart is checked

Reported by: Kevin C Owned by:
Priority: minor Milestone: undecided
Component: smartctl Version: 6.5
Keywords: ata Cc:

Description

I am trying to run Captive tests on my Synology drives, because Short Offline (background) don't show any problem, and when I have tried Extended Offline - they ran for months without finishing. I have several sectors pending relocation, but they never seemed to relocate.

However, I've found that Short Captive tests will tell me once it hits a bad block, and then I can go and force the issue with hdparm on that LBA, and repeat the process to clear the pending relocation sectors.

https://i.imgur.com/CUQDXok.png

The commands I am using are these

#start the test
smartctl -t short -C /dev/sda -d ata

wait a while (Generally about 10 minutes to an hour) and then check if it is done with

#Check disk information
smartctl -a /dev/sda -d ata

if the tests were done, you should get badblock LBA information - but if they weren't then presumably the commands requesting SMART are failing to return in time, and causing the host reset it mentions interrupting the test.

use HDParm to forcibly fix or relocate the sector

hdparm --repair-sector 218492585 --yes-i-know-what-i-am-doing /dev/sda

My main uncertainty is that as you can see from the screenshot - I often accidentally check if the test is done running before it has finished, which seems to cause the driver/drive to interrupt the captive test and prevent it finishing, is there some way that would be better to check if a Captive test is still in progress?

Until then, I am calling this Schrodinger's Test - we don't know if it's still alive or errored or if checking killed it by checking until we check

Change History (4)

comment:1 by Kevin C, 3 years ago

in general, if it says "interrupted" assume I shortly afterward started a fresh test - so you can see that some of these - like # 6 Extended Captive was running for 92 hours before I interrupted it, or # 2 Short Captive was running for ~20 hours before I checked on it.

So it was still running that Short Captive test after 20 hours, when the program estimated it would be done in ~2 minutes
https://i.imgur.com/4SmnMRZ.png

comment:2 by Christian Franke, 3 years ago

This is the behavior of a specific (which?) drive model in conjunction with a 5 year old version of smartctl.

During captive tests, the drive does not accept any ATA command except a possible device reset performed by the OS driver after some configured timeout. Captive test may always be interrupted by the OS before completion (related: ticket #1066). It is dangerous to use captive tests on drives with mounted partitions.

Non-captive tests should perform the same checks as captive tests, but without these problems. If not, this is a disk firmware bug.

If long tests do not work, I would recommend to run a regular read test with backblocks or ddrescue. See also the FAQ and the Bad Block HOWTO.

If you have any related enhancement request, please be more specific and describe it here.

For future support questions, please use the smartmontools-support mailing list instead. Thanks.

PS: Please do not use screen shots. Please do not paste smartctl output unchanged to tickets. Use plain-text attachments or wiki markup instead.

comment:3 by Christian Franke, 3 years ago

Milestone: undecided

comment:4 by Christian Franke, 3 years ago

Component: allsmartctl
Keywords: ata added
Note: See TracTickets for help on using tickets.