Changes between Version 26 and Version 27 of BadBlockHowto


Ignore:
Timestamp:
May 21, 2023, 11:20:24 AM (12 months ago)
Author:
Artoria2e5
Comment:

Formatting: make the steps bold so they are easier to find & 1 comment

Legend:

Unmodified
Added
Removed
Modified
  • BadBlockHowto

    v26 v27  
    5252}}}
    5353
    54 First Step: We need to locate the partition on which this sector of the disk lives:
     54**First Step**: We need to locate the partition on which this sector of the disk lives:
    5555
    5656{{{
     
    7777You can see that this is an `ext2` file system, mounted at `/data`.
    7878
    79 Second Step: we need to find the block size of the file system (normally 4096 bytes for `ext2`):
     79**Second Step:** we need to find the block size of the file system (normally 4096 bytes for `ext2`):
    8080
    8181{{{
     
    8585}}}
    8686
    87 In this case the block size is 4096 bytes. Third Step: we need to determine which File System Block contains this LBA. The formula is:
     87In this case the block size is 4096 bytes.
     88
     89**Third Step**: we need to determine which File System Block contains this LBA. The formula is:
    8890
    8991{{{
     
    109111Note: the fractional part of `0.125` indicates that this problem LBA is actually the second of the eight sectors that make up this file system block.
    110112
    111 Fourth Step: we use `debugfs` to locate the inode stored in this block, and the file that contains that inode:
     113**Fourth Step:** we use `debugfs` to locate the inode stored in this block, and the file that contains that inode:
    112114
    113115{{{
     
    161163}}}
    162164
    163 Fifth Step NOTE: '''This last step will permanently and irretrievably destroy the contents of the file system block that is damaged''': if the block was allocated to a file, some of the data that is in this file is going to be overwritten with zeros. You will not be able to recover that data unless you can replace the file with a fresh or correct version.
     165**Fifth Step** NOTE: '''This last step will permanently and irretrievably destroy the contents of the file system block that is damaged''': if the block was allocated to a file, some of the data that is in this file is going to be overwritten with zeros. You will not be able to recover that data unless you can replace the file with a fresh or correct version.
    164166
    165167To force the disk to reallocate this bad block we'll write zeros to the bad block, and sync the disk:
     
    414416creates the file. Leave it running until the partition/file system is full. This will make the disk reallocate those sectors which do not belong to a file. Check the `smartctl -a` output after that and make sure that the sectors are reallocated. If any remain, use the debugfs method. Of course the usual caveats apply - back it up first, and so on.
    415417
     418Comment by Mingye Wang: wouldn't it be easier to skip to step 5 and do the `dd` or `hdparm`?
     419
    416420=== ReiserFS example ===
    417421
     
    427431}}}
    428432
    429 [Step 0] The SMART selftest/error log (see `smartctl -l selftest`) indicated there was a problem with block address (i.e. the 512 byte sector at) `58656333`. The partition table (e.g. see `sfdisk -luS /dev/hda` or `fdisk -ul /dev/hda`) indicated that this block was in the `/dev/hda3` partition which contained a `ReiserFS` file system. That partition started at block address `54781650`.
     433**[Step 0]** The SMART selftest/error log (see `smartctl -l selftest`) indicated there was a problem with block address (i.e. the 512 byte sector at) `58656333`. The partition table (e.g. see `sfdisk -luS /dev/hda` or `fdisk -ul /dev/hda`) indicated that this block was in the `/dev/hda3` partition which contained a `ReiserFS` file system. That partition started at block address `54781650`.
    430434
    431435While doing the initial analysis it may also be useful to take a copy of the disk attributes returned by `smartctl -A /dev/hda`. Specifically the values associated with the `Reallocated_Sector_Ct` and `Reallocated_Event_Count` attributes (for `ATA` disks, the grown list (`GLIST`) length for SCSI disks). If these are incremented at the end of the procedure it indicates that the disk has re-allocated one or more sectors.
    432436
    433 [Step 1] Get the file system's block size:
     437**[Step 1]** Get the file system's block size:
    434438
    435439{{{
     
    438442}}}
    439443
    440 [Step 2] Calculate the block number:
     444**[Step 2]** Calculate the block number:
    441445
    442446{{{
     
    447451It is re-assuring that the calculated 4 KB damaged block address in `/dev/hda3` is less than `Count of blocks on the device` shown in the output of `debugreiserfs` shown above.
    448452
    449 [Step 3] Try to get more info about this block => reading the block fails as expected but at least we see now that it seems to be unused. If we do not get the `Cannot read the block` error we should check if our calculation in [Step 2] was correct ;)
     453**[Step 3]** Try to get more info about this block => reading the block fails as expected but at least we see now that it seems to be unused. If we do not get the `Cannot read the block` error we should check if our calculation in [Step 2] was correct ;)
    450454
    451455{{{
     
    465469So it looks like we have the right (i.e. faulty) block address.
    466470
    467 [Step 4] Try then to find the affected file ^[#footnote3 [3]]^:
     471**[Step 4]** Try then to find the affected file ^[#footnote3 [3]]^:
    468472
    469473{{{
     
    473477If you do not find any unreadable files, then the block may be free or located in some metadata of the file system.
    474478
    475 [Step 5] Try your luck: bang the affected block with `badblocks -n` (non-destructive read-write mode, do unmount first), if you are very lucky the failure is transient and you can provoke reallocation ^[#footnote4 [4]]^:
     479**[Step 5]** Try your luck: bang the affected block with `badblocks -n` (non-destructive read-write mode, do unmount first), if you are very lucky the failure is transient and you can provoke reallocation ^[#footnote4 [4]]^:
    476480
    477481{{{
     
    483487check success with `debugreiserfs -1 484335 /dev/hda3`. Otherwise:
    484488
    485 [Step 6] Perform this step only if Step 5 has failed to fix the problem: overwrite that block to force reallocation:
     489**[Step 6]** Perform this step only if Step 5 has failed to fix the problem: overwrite that block to force reallocation:
    486490
    487491{{{
     
    492496}}}
    493497
    494 [Step 7] If you can't rule out the bad block being in metadata, do a file system check:
     498**[Step 7]** If you can't rule out the bad block being in metadata, do a file system check:
    495499
    496500{{{
     
    500504This could take a long time so you probably better go for lunch ...
    501505
    502 [Step 8] Proceed as stated earlier. For example, sync disk and run a long selftest that should succeed now.
     506**[Step 8]** Proceed as stated earlier. For example, sync disk and run a long selftest that should succeed now.
    503507
    504508== Repairs at the disk level ==
     
    540544The SCSI disk command set and associated disk architecture are assumed in this section. SCSI disks have their own logical to physical mapping allowing a damaged sector (usually carrying 512 bytes of data) to be remapped irrespective of the operating system, file system or software RAID being used.
    541545
    542 The terms ''block and sector'' are used interchangeably, although block tends to get used in higher level or more abstract contexts such as a ''logical block''.
     546The terms ''block'' and ''sector'' are used interchangeably, although block tends to get used in higher level or more abstract contexts such as a ''logical block''.
    543547
    544548When a SCSI disk is formatted, defective sectors identified during the manufacturing process (the so called primary list: PLIST), those found during the format itself (the certification list: CLIST), those given explicitly to the format command (the DLIST) and optionally the previous grown list (GLIST) are not used in the logical block map. The number (and low level addresses) of the unmapped sectors can be found with the `READ DEFECT DATA SCSI` command.
     
    561565SCSI disks expect unrecoverable errors to be fixed manually using the `REASSIGN BLOCKS SCSI` command since loss of data is involved. It is possible that an operating system or a file system could issue the `REASSIGN BLOCKS` command itself but the authors are unaware of any examples. The `REASSIGN BLOCKS` command will reassign one or more blocks, attempting to (partially ?) recover the data (a forlorn hope at this stage), fetch an unused spare sector from the current zone while adding the damaged old sector to the GLIST (hence the name ''grown'' list). The contents of the GLIST may not be that interesting but `smartctl` prints out the number of entries in the grown list and if that number grows quickly, the disk may be approaching the end of its useful life.
    562566
    563 Here is an alternate brute force technique to consider: if the data on the SCSI or ATA disk has all been backed up (e.g. is held on the other disks in a RAID 5 enclosure), then simply reformatting the disk may be the least cumbersome approach.
     567Here is an alternate brute force technique to consider: if the data on the SCSI or ATA disk has all been backed up (e.g. is held on the other disks in a RAID 5 enclosure), then simply reformatting the disk may be the least cumbersome approach. Make sure to disable "quick format" so the formatting actually write through the entire disk!
    564568
    565569==== Example ====
     
    10621066
    10631067|| Date || Author || Description ||
     1068||2023-05-21||Artoria2e5||Formatting: make the steps bold so they are easier to find & 1 comment||
    10641069||2021-06-12||ttsiodras||Added a note about LUKS-encrypted partitions hosting ext filesystems||
    10651070||2017-03-29||chrfranke||Add section Case Studies with a real-life example using ddrescue and sleuthkit on Windows||