Changes between Version 5 and Version 6 of BadBlockHowto


Ignore:
Timestamp:
Mar 25, 2017, 2:24:44 PM (3 years ago)
Author:
Gabriele Pohl
Comment:

ReiserFS example

Legend:

Unmodified
Added
Removed
Modified
  • BadBlockHowto

    v5 v6  
    369369creates the file. Leave it running until the partition/file system is full. This will make the disk reallocate those sectors which do not belong to a file. Check the `smartctl -a` output after that and make sure that the sectors are reallocated. If any remain, use the debugfs method. Of course the usual caveats apply - back it up first, and so on.
    370370
     371=== ReiserFS example ===
     372
     373This section was written by Joachim Jautz with additions from Manfred Schwarb.
     374
     375The following problems were reported during a scheduled test:
     376
     377{{{
     378smartd[575]: Device: /dev/hda, starting scheduled Offline Immediate Test.
     379[... 1 hour later ...]
     380smartd[575]: Device: /dev/hda, 1 Currently unreadable (pending) sectors
     381smartd[575]: Device: /dev/hda, 1 Offline uncorrectable sectors
     382}}}
     383
     384[Step 0] The SMART selftest/error log (see `smartctl -l selftest`) indicated there was a problem with block address (i.e. the 512 byte sector at) `58656333`. The partition table (e.g. see `sfdisk -luS /dev/hda` or `fdisk -ul /dev/hda`) indicated that this block was in the `/dev/hda3` partition which contained a `ReiserFS` file system. That partition started at block address `54781650`.
     385
     386While doing the initial analysis it may also be useful to take a copy of the disk attributes returned by `smartctl -A /dev/hda`. Specifically the values associated with the `Reallocated_Sector_Ct` and `Reallocated_Event_Count` attributes (for ATA disks, the grown list (`GLIST`) length for `SCSI` disks). If these are incremented at the end of the procedure it indicates that the disk has re-allocated one or more sectors.
     387
     388[Step 1] Get the file system's block size:
     389
     390{{{
     391# debugreiserfs /dev/hda3 | grep '^Blocksize'
     392Blocksize: 4096
     393}}}
     394
     395[Step 2] Calculate the block number:
     396
     397{{{
     398# echo "(58656333-54781650)*512/4096" | bc -l
     399484335.37500000000000000000
     400}}}
     401
     402It is re-assuring that the calculated 4 KB damaged block address in `/dev/hda3` is less than `Count of blocks on the device` shown in the output of `debugreiserfs` shown above.
     403
     404[Step 3] Try to get more info about this block => reading the block fails as expected but at least we see now that it seems to be unused. If we do not get the `Cannot read the block` error we should check if our calculation in [Step 2] was correct ;)
     405
     406{{{
     407# debugreiserfs -1 484335 /dev/hda3
     408debugreiserfs 3.6.19 (2003 http://www.namesys.com)
     409484335 is free in ondisk bitmap
     410The problem has occurred looks like a hardware problem.
     411}}}
     412
     413If you have bad blocks, we advise you to get a new hard drive, because once you get one bad block that the disk drive internals cannot hide from your sight, the chances of getting more are generally said to become much higher (precise statistics are unknown to us), and this disk drive is probably not expensive enough for you to risk your time and data on it. If you don't want to follow that advice then if you have just a few bad blocks, try writing to the bad blocks and see if the drive remaps the bad blocks (that means it takes a block it has in reserve and allocates it for use for of that block number). If it cannot remap the block, use badblock option (`-B`) with reiserfs utils to handle this block correctly.
     414
     415{{{
     416bread: Cannot read the block (484335): (Input/output error).
     417Aborted
     418}}}
     419
     420So it looks like we have the right (i.e. faulty) block address.
     421
     422[Step 4] Try then to find the affected file [#footnote3 [3]]:
     423
     424{{{
     425tar -cO /mydir | cat >/dev/null
     426}}}
     427
     428If you do not find any unreadable files, then the block may be free or located in some metadata of the file system.
     429
     430[Step 5] Try your luck: bang the affected block with `badblocks -n` (non-destructive read-write mode, do unmount first), if you are very lucky the failure is transient and you can provoke reallocation [#footnote4 [4]]:
     431
     432{{{
     433# badblocks -b 4096 -p 3 -s -v -n /dev/hda3 `expr 484335 + 100` `expr 484335 - 100`
     434}}}
     435
     436[#footnote5 [5]]
     437
     438check success with `debugreiserfs -1 484335 /dev/hda3`. Otherwise:
     439
     440[Step 6] Perform this step only if Step 5 has failed to fix the problem: overwrite that block to force reallocation:
     441
     442{{{
     443# dd if=/dev/zero of=/dev/hda3 count=1 bs=4096 seek=484335
     4441+0 records in
     4451+0 records out
     4464096 bytes transferred in 0.007770 seconds (527153 bytes/sec)
     447}}}
     448
     449[Step 7] If you can't rule out the bad block being in metadata, do a file system check:
     450
     451{{{
     452reiserfsck --check
     453}}}
     454
     455This could take a long time so you probably better go for lunch ...
     456
     457[Step 8] Proceed as stated earlier. For example, sync disk and run a long selftest that should succeed now.
     458
    371459== Footnotes ==
    372460
    373461[=#footnote1 [1]] Self-Monitoring, Analysis and Reporting Technology -> SMART
    374462
    375 [=#footnote2 [2]] Starting with GNU coreutils release 5.3.0, the `dd` command in Linux includes the options 'iflag=direct' and 'oflag=direct'. Using these with the `dd` commands should be helpful, because adding these flags should avoid any interaction with the block buffering IO layer in Linux and permit direct reads/writes from the raw device. Use `dd --help` to see if your version of dd supports these options. If not, the latest code for dd can be found at https://www.gnu.org/software/coreutils/.
     463[=#footnote2 [2]] Starting with GNU coreutils release 5.3.0, the `dd` command in Linux includes the options 'iflag=direct' and 'oflag=direct'. Using these with the `dd` commands should be helpful, because adding these flags should avoid any interaction with the block buffering IO layer in Linux and permit direct reads/writes from the raw device. Use `dd --help` to see if your version of dd supports these options. If not, the latest code for dd can be found at https://www.gnu.org/software/coreutils/.
     464
     465[=#footnote3 [3]] Do not use `tar -c -f /dev/null or tar -cO /mydir >/dev/null`. GNU tar does not actually read the files if `/dev/null` is used as archive path or as standard output, see info tar.
     466
     467[=#footnote4 [4]] Important: set blocksize range is arbitrary, but do not only test a single block, as bad blocks are often social. Not too large as this test probably has not 0% risk.
     468
     469[=#footnote5 [5]] The rather awkward `expr 484335 + 100` (note the back quotes) can be replaced with `$((484335+100))` if the `bash` shell is being used. Similarly the last argument can become `$((484335-100))`.