Opened 10 years ago

Closed 9 years ago

Last modified 9 years ago

#82 closed enhancement (wontfix)

danger of high Load_Cycle_Count and WD 'Intelli-park' self-destruction "feature"

Reported by: virtuousfox Owned by: somebody
Priority: minor Milestone:
Component: smartd Version: 5.39.1
Keywords: Cc:

Description

recently i was quite unpleasantly introduced to the issue of drive self-destruction via infinite and damaging tries to "save some energy" best described in those links:
№1) http://community.wdc.com/t5/Desktop/Green-Caviar-High-Load-Cycle-Cout-after-short-operation-time/td-p/15731
№2) http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=5357
№3) http://forum.synology.com/enu/viewtopic.php?f=124&t=11682&sid=2aed763304351bcebd7e7ec49beda77e
№4-1) http://forum.synology.com/enu/viewtopic.php?f=124&t=10504&start=60
№4-2) http://home.arcor.de/ghostadmin/wdidle3_1_00.zip
№5) http://support.wdc.com/product/download.asp?groupid=609&sid=113

long story... longer

i'm a "proud" owner of 5 WD drives, 3 of which had this "feature" enabled from factory and those happened to be most recently acquired ones:
1) WDC WD15EARS-00Z5B1 [80.00A80] acquired approximately in
winter 2010 with currently
2608 Power_On_Hours,
104288 Load_Cycle_Count
and zero bad sectors
2) WDC WD10EADS-00M2B0 [01.00A01] acquired
~autumn 2009 with now
6195 Power_On_Hours,
54748 Load_Cycle_Count
and 1 uncorrectable, 1 pending sectors
3) WDC WD10EADS-65L5B1 [01.01A01] acquired
~spring 2009 with now
8838 Power_On_Hours,
273 Load_Cycle_Count
and 1 uncorrectable, 1 pending sectors
4) WDC WD3000JS-00PDB0 [21.00M21] acquired
sometime 2006-2007 with now
27332 Power_On_Hours,
688 Power_Cycle_Count (no LCC counter)
and zero bad sectors but 20 reallocations
5) WDC WD2500AAJS-00VTA0 [01.01B01] acquired
sometime ~2008 with now
16499 Power_On_Hours,
350 Load_Cycle_Count (same as Power_Cycle_Count)
and 6 uncorrectable, 6 pending sectors.

as you may see - newest drives have ridiculous amount of LCCs but a wasn't paying any attention to them, until about 1-2 mounts ago drives 2 and 3 (same model) began to stop answering to kernel and it started resetting them very often (always at times of their low but non-zero r/w activity like using torrent, watching low-bitrate videos or answering hddtemp/smartd queries) [ http://pastebin.ca/1873324 ].

it aggravated in spoiled sectors for both of them yesterday and i started digging and dug those links on top of the ticket.

program from link №4-2 (version 1.00) showed that "Intelli-Park feature" was:
enabled on drives 1,2,3 and set to default of 8 seconds
but disabled at 4
and didn't exist on drive 5.

instead of letting me disable it, utility of 1.00 version has set minimum of 6 seconds 'idle timer' for all 4 (no way to select one drive at a time), so i had to set all four for maximum of 25.5 seconds on second try.

then i used version 1.05 from link №5 and it said that drives 1,2,3 are "newest drives" and their timer can be set from 30 seconds to 300 or properly disabled but gave 'busy' errors on drives 4,5.
so, i issued 'disable' command and it reported that 'idle timer' for drives 1,2,3 was disabled but stuck completely on drive 4 and i had to hard reset DOS along with it.

before i tried any of programs i looked for LCC via smartctl for a while and it was growing approximately 1 time per 10-60 seconds which was not good at all.
after manipulation with programs it's increasing only 1 time per complete shutdown/startup (as Power_Cycle_Count).

i hadn't any reseting issues from that time also (but it was just yesterday so we'll see later).

strange thing: drive 2 and 3 failed identically and most of times they was reseted by kernel simultaneously but drive 3 has newer firmware and normal number of LCCs.
my thought that WD did same thing with it as some people think they did earlier with EACS drive series for which they have acknowledged the issue at first and allegedly "fixed" with never revisions/firmware.
after all - we know how they "fixed" the issue of unaligned partitions (and you already know what "nice" idea caused that) on EARS series 4K-block drives.

all in all

i write all that here because that issue cannot be ignored by people whose drives are not damaged too much yet - they must know the issue and prevent it.

after reading links at a top i do not think that WD going to notify anyone:
on link №2 they blame "Linux":
"Some utilities, operating systems, and applications, such as some implementations of Linux, for example, are not optimized for low power storage devices and can cause our drives to wake up at a higher rate than normal."
but not only Windows(tm) users suffer from it too (and their kernel is not capable of resetting the drive and not die) but 2 of 3 WD's suggestions to fix it are useless (even without logging there's no way system can stay without any r/w activity for more than a minute and most of those drives are not capable of APM to begin with).

the only effective way of preventing damage i see is to alert user at once about high LCC increase per some short interval of time via SMART monitoring software such as smartmontools/smartd and hope that he would be able to get ahold of 'wdidle3' program or at least to tune kernel (dirty_writeback_centisecs, dirty_expire_centisecs,etc.) so it will give away writes at least once per 7 seconds or something.

be able to tune settings of a timer via something else than obscure and glitchy DOS program would be nice too but WD help on that is not expectable and reverse-engineering is unlikely.

Attachments (3)

smartctl (25.8 KB) - added by virtuousfox 10 years ago.
'smartctl -a' output
smartctl-after_stopstart (25.8 KB) - added by virtuousfox 10 years ago.
'smartctl -a' output after one system shutdown
hdparm (16.7 KB) - added by virtuousfox 10 years ago.
'hdparm -I' output

Download all attachments as: .zip

Change History (11)

Changed 10 years ago by virtuousfox

Attachment: smartctl added

'smartctl -a' output

Changed 10 years ago by virtuousfox

Attachment: smartctl-after_stopstart added

'smartctl -a' output after one system shutdown

Changed 10 years ago by virtuousfox

Attachment: hdparm added

'hdparm -I' output

comment:1 Changed 10 years ago by Christian Franke

Keywords: needinfo added

Any suggestions what could be added to smartd to handle this drive firmware issue?

comment:2 Changed 10 years ago by virtuousfox

honestly, i don't know what can be done beyond smartd complaining about LCC count if it rises with certain threshold (like 50 per hour, 500 per 24 or something) as it complains about uncorrectable sectors and such. this "feature" built in affected drives as deep as 512b sector emulation (most of those, except deprecated ones, are also with 4K sectors).

default 8 (or 30, for newer drives) second timeout can be controlled only via proprietary SATA extension with their closed dos-only utility and even it performs poorly. but doing something with it at all probably is outside of smartd scope. reading and interpreting (in seconds) a value of this "feature" setting would be nice but someone with a clue should gut this dos program for that probably.

comment:3 in reply to:  2 Changed 10 years ago by Christian Franke

Keywords: needinfo removed
Priority: majorminor

Replying to virtuousfox:

... smartd complaining about LCC count if it rises with certain threshold (like 50 per hour, 500 per 24 or something) as it complains about uncorrectable sectors and such.

Would IMO be a useful smartd feature. We might add this in the future, so I keep the ticket open.

comment:4 Changed 9 years ago by wintrmute

I note that one of my drives (WD15EARS) has 45000 load cycles in 6400 hours of use.. That's a relatively low number compared to some, but still seems rather high.

Has anyone ported the wdidle.exe application to Linux? Surely it's just a few custom SATA commands that could be replicated?

comment:5 in reply to:  4 Changed 9 years ago by Christian Franke

Has anyone ported the wdidle.exe application to Linux? Surely it's just a few custom SATA commands that could be replicated?

AFAIK custom commands are not needed. Setting Advanced Power Management mode to 254 (255?) and/or disabling the standby timer may help: hdparm -B 254 -S 0 /dev/ice

comment:6 Changed 9 years ago by wintrmute

I can't set the APM level on the device with -B, it says:
APM_level = not supported

-S 0 doesn't seem to help - still seeing the load cycle going up regularly.

I've previously tried other -S values and they seem to get ignored. (I have hdparm setting it to 30 minutes at boot, but it has no effect)

:(

comment:7 Changed 9 years ago by Christian Franke

Resolution: wontfix
Status: newclosed

There is no easy way to add a related warning feature to smartd.

comment:8 Changed 9 years ago by virtuousfox

first of all, that has nothing to do with APM or a standby timer. it's proprietary WD "feature".

second, there is "no easy way" for adding syslog warning just like for increase in quantity of bad sectors ans such ? riiight...

Note: See TracTickets for help on using tickets.