Changes between Initial Version and Version 1 of Ticket #658, comment 12


Ignore:
Timestamp:
Feb 26, 2016, 3:32:19 PM (8 years ago)
Author:
Ch.Ris

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #658, comment 12

    initial v1  
    99Harddisks with error recovery control (ERC), also known as time-limited error recovery (TLER) from Western Digital, or command completion time limit (CCTL) from Samsung/Hitachi, allow to configure the amount of time a drive's firmware may spend attemting to recover from a read or write error.
    1010
    11 The error recovery (ERC) time of a drive *must* be shorter than the system's controller timeout. Otherwise errors will cause a controller reset and the loss of all unwritten data.
     11The error recovery (ERC) time of a drive *must* be shorter than the system's controller timeout. Otherwise errors will cause a controller reset and the loss of all unwritten data. Unfortunately, many drives by default have very long or disabled timeouts.
    1212
    1313With redundant RAID hardware or software configurations this is equally important. Here, resetting an entire drive instead of just retrying the failed block causes entire drives being marked as unusable, reducing the redundancy and performance. Furthermore, during the re-sync of a drive there is a high likelihood of errors to occur (seldom used areas), and a drive reset during the re-sync can render the entire array unusable. Limiting the drives' recovery timeout also allows for improved error handling in hardware or software RAID environments. Instead of waiting for one drive to recover requested data, it can quickly be read from another (redundant) drive.