Changes between Version 28 and Version 29 of BadBlockHowto


Ignore:
Timestamp:
May 22, 2023, 9:49:57 AM (11 months ago)
Author:
Artoria2e5
Comment:

Add some links to the sg_* utilities. Do a swap on footnote #5. There's a lot of repeated stuff on this page that I honestly don't know what to to with. Ideally we would introduce the topic, talk about what disk do, talk about how to fix it on the disk level, than build up onto fixing what could be there (filesystems & partition tables). It sounds a little riskier, but then what's lost is already lost.

Legend:

Unmodified
Added
Removed
Modified
  • BadBlockHowto

    v28 v29  
    480480
    481481{{{
    482 # badblocks -b 4096 -p 3 -s -v -n /dev/hda3 `expr 484335 + 100` `expr 484335 - 100`
    483 }}}
    484 
    485 [#footnote5 [5]]
     482# badblocks -b 4096 -p 3 -s -v -n /dev/hda3 $((484335 + 100)) $((expr 484335 - 100))
     483}}}
    486484
    487485check success with `debugreiserfs -1 484335 /dev/hda3`. Otherwise:
     
    568566In either case, remapping will fetch an unused spare sector from the current zone while adding the damaged old sector to the GLIST (hence the name ''grown'' list). The difference is in the `REASSIGN STATUS` field from Background Scan Results, which describes how a reassignment happened. The contents of the GLIST may not be that interesting but `smartctl` prints out the number of entries in the grown list and if that number grows quickly, the disk may be approaching the end of its useful life.
    569567
    570 In the ATA command set, the OS is not given access to such fine-grained control as in SCSI. The equiavelant of AWRE nearly always happens, so all you do is write over the defect.
    571 
    572 Here is an alternate brute force technique to consider: if the data on the SCSI or ATA disk has all been backed up (e.g. is held on the other disks in a RAID 5 enclosure), then simply reformatting the disk may be the least cumbersome approach. Make sure to disable "quick format" so the formatting actually write through the entire disk!
     568Here is an alternate brute force technique to consider: if the data on the SCSI disk has all been backed up (e.g. is held on the other disks in a RAID 5 enclosure), then simply reformatting the disk with [`sg_format` https://manned.org/sg_format.8] may be the least cumbersome approach.
     569
     570(In the ATA command set, the OS is not given access to such fine-grained control as in SCSI. The equiavelant of AWRE nearly always happens, so all you do is write over the defect. You also don't get an LBA-level format operation, only "security erase".)
    573571
    574572==== Example ====
     
    576574Given a ''bad block'', it still may be useful to look at the `fdisk` command (if the disk has multiple partitions) to find out which partition is involved, then use `debugfs` (or a similar tool for the file system in question) to find out which, if any, file or other part of the file system may have been damaged. This is discussed in section [#Repairsinafilesystem Repairs in a file system].
    577575
    578 Then a program that can execute the `REASSIGN BLOCKS SCSI` command is required. In Linux (2.4 and 2.6 series), FreeBSD, Tru64(OSF) and Windows the author's `sg_reassign` utility in the `sg3_utils` package can be used. Also found in that package is `sg_verify` which can be used to check that a block is readable.
     576Then a program that can execute the `REASSIGN BLOCKS` SCSI command is required. In Linux (2.4 and 2.6 series), FreeBSD, Tru64(OSF) and Windows the Douglas Gilbert's `sg_reassign` utility in the `sg3_utils` package can be used. Also found in that package is `sg_verify` which can be used to check that a block is readable.
    579577
    580578Assume that `logical block address 1193046` (which is `123456` in hex) is corrupt ^[#footnote10 [10]]^ on the disk at `/dev/sdb`. A long selftest command like `smartctl -t long /dev/sdb` may result in log results like this:
     
    619617}}}
    620618
    621 The GLIST length has grown by one as expected. If the disk was unable to recover any data, then the ''new'' block at lba `0x123456` has vendor specific data in it. The `sg_reassign` utility can also do bulk reassigns, see `man sg_reassign` for more information.
     619The GLIST length has grown by one as expected. If the disk was unable to recover any data, then the ''new'' block at lba `0x123456` has vendor specific data in it. The `sg_reassign` utility can also do bulk reassigns, see [`man sg_reassign` https://manned.org/sg_reassign.8] for more information.
    622620
    623621The `dd` command could be used to read the contents of the ''new'' block:
     
    635633More work may be needed at the file system level, especially if the reassigned block held critical file system information such as a superblock or a directory.
    636634
    637 Even if a full backup of the disk is available, or the disk has been ''ejected'' from a RAID, it may still be worthwhile to reassign the bad block(s) that caused the problem (or simply format the disk (see `sg_format` in the `sg3_utils package`)) and re-use the disk later (not unlike the way a replacement disk from a manufacturer might be used).
    638 
     635Even if a full backup of the disk is available, or the disk has been ''ejected'' from a RAID, it may still be worthwhile to reassign the bad block(s) that caused the problem (or simply format the disk (see `sg_format`)) and re-use the disk later (not unlike the way a replacement disk from a manufacturer might be used).
    639636
    640637== Case Studies ==
     
    10461043[=#footnote4 [4]] Important: set blocksize range is arbitrary, but do not only test a single block, as bad blocks are often social. Not too large as this test probably has not 0% risk.
    10471044
    1048 [=#footnote5 [5]] The rather awkward {{{`expr 484335 + 100`}}} (note the back quotes) can be replaced with `$((484335+100))` if `bash` or any [http://pubs.opengroup.org/onlinepubs/007908799/xcu/chap2.html#tag_001_006_004 POSIX 1997+] compatible shell is being used. Similarly the last argument can become `$((484335-100))`. See also ^[#footnote11 11.]^ below.
     1045[=#footnote5 [5]] We use the [http://pubs.opengroup.org/onlinepubs/007908799/xcu/chap2.html#tag_001_006_004 POSIX 1997+] arithmetic expansion here. If you are using an ancient or unusual shell, try the slightly more awkward {{{`expr 484335 + 100`}}} and {{{`expr 484335 - 100`}}} (the backticks are mandatory). See also ^[#footnote11 11.]^ below.
    10491046
    10501047[=#footnote6 [6]] `testdisk` scans the media for the beginning of file systems that it recognizes. It can be tricked by data that looks like the beginning of a file system or an old file system from a previous partitioning of the media (disk). So care should be taken. Note that file systems should not overlap apart from the fact that extended partitions lie wholly within a extended partition table allocation. Also if the root partition of a !Linux/Unix installation can be found then the `/etc/fstab` file is a useful resource for finding the partition numbers of other partitions.