Opened 18 months ago

Closed 18 months ago

Last modified 12 months ago

#1207 closed defect (invalid)

WD Red 6TB - WDC WD60EFAX-68SHWN0 reports wrong self-test polling time recommendation

Reported by: Bear_ Owned by:
Priority: minor Milestone:
Component: smartctl Version: 7.0
Keywords: ata Cc:

Description

The duration of the long (extended) self-test is reported as 7 minutes.
Running the long self-test with smartctl (under linux) indeed takes 7 minutes. Running the extended self test under W7 with Data Lifeguard Diagnostics took about 4 hours and 30 minutes (same disk of course).

I also see that the conveyance test has the same recommended duration as the short self-test, which is unusual. I didn't run these tests on either systems.

Is this a bug in smartctl or in the disk?

Attachments (2)

smartctl-WDC-WD60EFAX-68SHWN0.txt (16.6 KB) - added by Bear_ 18 months ago.
result of smartctl -q noserial -x /dev/sdc > smartctl-VENDOR-MODEL.txt
WD60EFAX-68SHWN0_ataioctl_2.txt (6.9 KB) - added by Bear_ 18 months ago.
result of smartctl -r ataioctl,2 -q noserial -c /dev/sdc

Download all attachments as: .zip

Change History (11)

Changed 18 months ago by Bear_

result of smartctl -q noserial -x /dev/sdc > smartctl-VENDOR-MODEL.txt

comment:1 Changed 18 months ago by Christian Franke

Component: allsmartctl
Keywords: ata added
Milestone: undecided

ATA-8 introduced a new field for drives with extended self-test polling time > 0xff. Either the drive does not set it correctly or smartctl does not interpret it correctly.

Please provide output of:
smartctl -r ataioctl,2 -q noserial -c /dev/sdc

Changed 18 months ago by Bear_

result of smartctl -r ataioctl,2 -q noserial -c /dev/sdc

comment:2 Changed 18 months ago by Christian Franke

Spec for Device SMART data structure from T13/1699-D Revision 6a (ATA8-ACS) up to T13/2161-D Revision 5 (ACS-3):

OffsetDescription
372Short self-test routine recommended polling time (in minutes).
373Extended self-test routine recommended polling time in minutes. If FFh, use bytes 375 and 376 for the polling time.
374Conveyance self-test routine recommended polling time in minutes.
375..376Extended self-test routine recommended polling time in minutes (word).

(ACS-4 and later removed SMART spec and refer to ACS-3)

Observed values:

...
REPORT-IOCTL: Device=/dev/sdc Command=SMART READ ATTRIBUTE VALUES
 Input:   FR=0xd0, SC=0x01, LL=...., LM=0x4f, LH=0xc2, DEV=...., CMD=0xb0 IN
 [Duration: 0.006s]
REPORT-IOCTL: Device=/dev/sdc Command=SMART READ ATTRIBUTE VALUES returned 0
...
368-383: 03 00 01 00 02 07 02 00 00 00 00 00 00 00 00 00 |................|
                              ^^-^^ Extended (word)
                           ^^------ Conveyance
                        ^^--------- Extended < 0xff, if 0xff see above
                     ^^------------ Short
...
General SMART Values:
...
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (   7) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.

Conclusion: smartctl prints the values as returned by the device.

comment:3 Changed 18 months ago by Christian Franke

Running the long self-test with smartctl (under linux) indeed takes 7 minutes. Running the extended self test under W7 with Data Lifeguard Diagnostics took about 4 hours and 30 minutes (same disk of course).

Are you sure that Data Lifeguard Diagnostics actually uses SMART self-tests?
Does smartctl -c report Self-test routine in progress... during such tests?
Do such tests appear is self-test logs (-l selftest -l xselftest) after completion?

comment:4 Changed 18 months ago by Bear_

Data Lifeguard Diagnostics (DLG) offers two tests, QUICK TEST and EXTENDED TEST (strangely enough no conveyance test). The description in the tool is:

QUICK TEST performs SMART drive quick self-test to gather and verify the Data Lifeguard information contained on the drive.

EXTENDED TEST performs a Full Media Scan to detect bad sectors. This test may take hours for a large drive.

So, previously it escaped my attention that the extended test is not even claimed to be a SMART self-test.

I started both scans, and the quick test showed up in smartctl -c report Self-test routine in progress ... 90% of test remaining (at some point) It completed after 2 minutes, and the test did appear in the self-test log.

The DLG extended test did neither show up for smartctl -c, nor in the self-test log. I canceled the test after a minute, but the earlier DLG extended self-test was completed, and it also did not show up in the log.

Edit: I also did the self-tests with PassMark?'s DiskCheckup?. Both short and extended self-test show up in smartctl (both -c and -l selftest). The short took 2 minutes, and the extended 7 minutes.

I am not a drive expert, but as far as I understand the self-test polling time is indeed not set(?) correctly. I am confused, because the drive seems to do what it says, but I can't imagine that the extended (SMART) self-test does a full surface scan (as I think is usual for extended self-tests) in 7 minutes.

Before I made a ticket here, I contacted WD and they said, that if the DLG extended test completes successfully, then the drive is okay.

What can I do, what should I do? Try to convince the WD help desk, or return the drive? I don't feel convenient with the idea that the SMART functionality is not implemented correctly in a drive that I use to store a lot of data.

Last edited 18 months ago by Bear_ (previous) (diff)

comment:5 in reply to:  4 Changed 18 months ago by Christian Franke

The DLG extended test did neither show up for smartctl -c, nor in the self-test log. I canceled the test after a minute, but the earlier DLG extended self-test was completed, and it also did not show up in the log.

This likely means that DLG does the read scan itself. Then the host read counters from device statistics (smartctl -l devstat or -x) should increase quickly during the test:

Device Statistics (GP Log 0x04)
293	Page  Offset Size        Value Flags Description
...
299	0x01  0x028  6     11721284557  ---  Logical Sectors Read
300	0x01  0x030  6        45788458  ---  Number of Read Commands

SMART self-tests should not affect read counters because no host I/O is done.

Edit: I also did the self-tests with PassMark's DiskCheckup. Both short and extended self-test show up in smartctl (both -c and -l selftest). The short took 2 minutes, and the extended 7 minutes.

This likely means that WD decided to implement an "extended" self-test which does no full read scan. So the polling time is set correctly but the test is implemented in an at least "unusual" way.

What can I do, what should I do? Try to convince the WD help desk, or return the drive?

Try the selective self-test which allows to specify LBA ranges, see man page. This command should perform a read scan of the full LBA range: smartctl -t select,0-max /dev/sdc

comment:6 Changed 18 months ago by Christian Franke

Resolution: invalid
Status: newclosed

The self-test recommended polling times are printed correctly by smartctl.

The extended self-test implemented by the firmware of this drive is far to short for a full read scan. This cannot be fixed by smartctl.

Note that the ATA standards do not specify what an extended self-test should do. Only a selective self-test is required to do a read scan.

comment:7 Changed 18 months ago by Christian Franke

Milestone: undecided

comment:8 Changed 13 months ago by Bren

Just wanted to drop a comment here for future seekers. I just picked up a replacement WD Red 6TB. I too noticed that the extended self-test polling interval was showing 7 minutes (and the test itself would indeed only take several minutes).

This confused me considering that all of my other WD Reds took 10+ hours to test and showed a polling time of 700+ minutes. I figured it was either a defective drive or something new they were doing.

I proceeded with my drive burn-in testing procedure because the drive seemed fine otherwise. I run short, long, and conveyance tests if available. Then I run a badblocks write test, then another short and long test.

badblocks writes several patterns on every block of the device then reads them back to compare. As the test progressed, the self-test polling time increased. Now that the full drive has been written to, extended self-test polling time is showing 710 minutes like the others.

Perhaps the drives now wait until blocks have been written to before it tests them in the extended test.

comment:9 Changed 12 months ago by Bear_

I can confirm this. I am also using the disk in question and it is about half full now. The reported polling time for the extended self-test is 353 minutes. It indeed looks like that the extended test only tests the parts that have been actually written.

Note: See TracTickets for help on using tickets.