Context Navigation

Changes between Version 13 and Version 14 of BadBlockHowto

Timestamp:: Mar 29, 2017, 10:52:46 PM (8 years ago)
Author:: Christian Franke
Comment:: Add section Case Studies with a real-life example using ddrescue and sleuthkit on Windows

Legend:

: Unmodified
: Added
: Removed
: Modified

BadBlockHowto

-              v13
+              v14
+== Case Studies ==
+This section is intended to collect step-by-step descriptions of some real-life use cases.
+=== Recovering a (mostly) unreadable sector of a Notebook HDD ===
+This was done in March 2016 under Windows 7 using ''Cygwin''^[#footnote12 12.]^ ports of ''GNU ddrescue''^[#footnote13 13.]^ and ''The Sleuth Kit (TSK)''^[#footnote14 14.]^. All commands shown should work similar on other platforms and with other filesystems.
+==== Determine Logical Block Address of unreadable sector ====
+Examine smartctl output:
+{{{
+root:~# smartctl -x /dev/sdb
+smartctl 6.5 2016-02-29 r4227 [x86_64-w64-mingw32-win7-sp1] (daily-20160229)
+...
+Model Family:     SAMSUNG SpinPoint MP5
+Device Model:     SAMSUNG HM640JJ
+...
+Firmware Version: 2AK10001
+User Capacity:    640.135.028.736 bytes [640 GB]
+Sector Size:      512 bytes logical/physical
+Rotation Rate:    7200 rpm
+Form Factor:      2.5 inches
+...
+ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
+...
+Reallocated_Sector_Ct   PO--CK   252   252   010    -    0
+...
+Power_On_Hours          -O--CK   100   100   000    -    251  <=== See Self-test Log below
+...
+Current_Pending_Sector  -O--CK   100   100   000    -    1    <=== At least 1 bad sector
+...
+SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
+Device Error Count: 351 (device log contains only the most recent 8 errors)
+...
+Error 351 [6] occurred at disk power-on lifetime: 251 hours (10 days + 11 hours)
+  When the command that caused the error occurred, the device was active or idle.
+  After command completion occurred, registers were:
+  ER -- ST COUNT  LBA_48  LH LM LL DV DC
+  -- -- -- == -- == == == -- -- -- -- --
+-- 51 00 01 00 00 33 3f d8 a6 40 00  Error: UNC 1 sectors at LBA = 0x333fd8a6 = 859822246  <=== Its LBA
+  Commands leading to the command that caused the error were:
+  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
+  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
+00 00 00 01 00 00 33 3f d8 a6 40 00     00:00:06.924  READ DMA EXT
+...
+SMART Extended Self-test Log Version: 1 (2 sectors)
+Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
+# 1  Short offline       Completed: read failure       90%       176         859822246  <=== Detected 75 power on hours ago
+}}}
+A read scan helps to verify the LBA and checks for other possible bad sectors
+(alternatively replace `/dev/null` by a file path to create a disk image):
+{{{
+root:~# ddrescue --ask --verbose --binary-prefixes --idirect --force /dev/sdb /dev/null disk.map
+GNU ddrescue 1.21-rc2
+About to copy 610480 MiBytes from /dev/sdb [SAMSUNG HM640JJ::...] to /dev/null [0].
+Proceed (y/N)? y
+...
+non-tried:        0 B,     errsize:      512 B,      run time:          2h
+  rescued: 610480 MiB,      errors:        1,  remaining time:         n/a
+percent rescued:  99.99%      time since last successful read:         20s
+Finished
+}}}
+The `ddrescue` map file now shows byte ranges of good and bad disk areas:
+{{{
+root:~# cat disk.map
+...
+#      pos        size      status
+x00000000  0x667FB14C00  +
+x667FB14C00    0x00000200  -  <=== 512 bytes unreadable
+x667FB14E00  0x2E8B541200  +
+}}}
+Translate the byte position to the LBA:
+{{{
+root:~# echo $((0x667FB14C00/512))
+859822246
+}}}
+Or convert the map file to a `badblocks` like list with `ddrescuelog` (part of recent versions of ''ddrescue'' package):
+{{{
+root:~# ddrescuelog --list-blocks=- disk.map
+859822246
+}}}
+Both match the LBA reported by `smartctl`.
+==== Find affected file ====
+Get start offset of affected partition:
+{{{
+root:~# fdisk --list /dev/sdb
+...
+Device     Boot Start        End    Sectors   Size Id Type
+/dev/sdb1          63 1250258624 1250258562 596.2G  7 HPFS/NTFS/exFAT
+}}}
+Get filesystem block (cluster) size if unknown (4096 in many cases):
+{{{
+root:~# fsstat /dev/sdb1
+...
+File System Type: NTFS
+...
+Sector Size: 512
+Cluster Size: 4096
+...
+}}}
+Calculate number of bad cluster as `(BAD_LBA - START_LBA) / SECTORS_PER_CLUSTER`:
+{{{
+root:~# echo $(((859822246-63)/8))
+107477772
+}}}
+Find inode (here: MFT entry) used by this cluster:
+{{{
+root:~# ifind -d 107477772 /dev/sdb1
+-128-2
+}}}
+Print some info about this inode:
+{{{
+root:~# istat /dev/sdb1 663-128-2
+...
+Name: Backup_2015-12-17.zip
+Parent MFT Entry: 30    Sequence: 1
+Allocated Size: 4660039680      Actual Size: 4660039516
+Created:        2015-12-17 13:43:30.460000000 (CET)
+File Modified:  2015-12-17 13:46:19.647000000 (CET)
+...
+Type: $DATA (128-2)   Name: N/A   Non-Resident   size: 4660039516  init_size: 4660039516
+106950180 106950181 ...
+...
+107477772  <=== The bad cluster
+...
+108087884
+}}}
+Find full path of affected file:
+{{{
+root:~# ffind /dev/sdb1 663-128-2
+/Backups/2015/Backup_2015-12-17.zip
+}}}
+If the file is no longer needed, it could be overwritten in place and removed then. This is easy with `shred` from ''GNU coreutils'': `shred --iterations=1 --remove /PATH/TO/FILE`. This should reallocate the bad sector in most cases.
+==== Try to recover the bad sector ====
+Start with 100 read retries of the bad sector, write to `recovered.bin` if successful:
+{{{
+root:~# ddrescue --ask --verbose --binary-prefixes --idirect --retry=100 \
+                 --input-position=859822246s --output-position=0 --size=1s \
+                 /dev/sdb recovered.bin recovered.map
+...
+Current status
+     ipos: 419835 MiB, non-trimmed:        0 B,  current rate:      32 B/s
+     opos:        0 B, non-scraped:        0 B,  average rate:       4 B/s
+non-tried:        0 B,     errsize:        0 B,      run time:      1m 49s
+  rescued:      512 B,      errors:        0,  remaining time:         n/a
+percent rescued: 100.00%      time since last successful read:          0s
+Finished
+}}}
+We were very lucky:
+{{{
+root:~# cat recovered.map
+...
+#      pos        size      status
+x00000000  0x667FB14C00  ?
+x667FB14C00  0x00000200    +  <=== Now OK!
+x667FB14E00  0x2E8B541200  ?
+}}}
+Check whether the disk firmware took the chance to reallocate the sector using the recovered data:
+{{{
+root:~# dd skip=859822246 count=1 iflag=direct if=/dev/sdb of=test.bin
+dd: error reading ‘/dev/sdb’: Input/output error
++0 records in
++0 records out
+bytes (0 B) copied, 23.5006 s, 0.0 kB/s
+}}}
+No luck in this case. So overwrite the sector manually:
+{{{
+root:~# dd seek=859822246 count=1 oflag=direct if=recovered.bin of=/dev/sdb
++0 records in
++0 records out
+bytes (512 B) copied, 1.05331 s, 0.5 kB/s
+}}}
+Read data back and check:
+{{{
+root:~# dd skip=859822246 count=1 iflag=direct if=/dev/sdb of=test.bin
++0 records in
++0 records out
+bytes (512 B) copied, 0.0211745 s, 24.2 kB/s
+root:~# diff -s recovered.bin test.bin
+Files recovered.bin and test.bin are identical
+}}}
+Finally, run a SMART self-test and check its result:
+{{{
+root:~# smartctl -t short /dev/sdb
+...
+Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
+...
+Please wait 2 minutes for test to complete.
+root:~# sleep 120 # :-)
+root:~# smartctl -x /dev/sdb
+...
+ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
+...
+Reallocated_Sector_Ct   PO--CK   252   252   010    -    0   <=== Interesting...
+...
+Power_On_Hours          -O--CK   100   100   000    -    252
+...
+Current_Pending_Sector  -O--CK   100   100   000    -    0   <=== As expected
+...
+SMART Extended Self-test Log Version: 1 (2 sectors)
+Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
+# 1  Short offline       Completed without error       00%       252         -          <=== Works again!
+# 2  Short offline       Completed: read failure       90%       176         859822246
+}}}
+Interestingly the `Reallocated_Sector_Ct` did not increase. Either the firmware did not record the reallocation or decided to reuse the original sector.
+Done!
 == Footnotes ==
 …
 [=#footnote11 [11]] Most window managers have a handy calculator that will do hex to decimal conversions.
+[=#footnote12 [12]] See [https://cygwin.com/].
+[=#footnote13 [13]] See [https://www.gnu.org/software/ddrescue/]. Note that on Debian and Ubuntu the package is named [https://packages.debian.org/stable/gddrescue gddrescue] because the (no longer available) package ''ddrescue'' provided the tool [http://www.garloff.de/kurt/linux/ddrescue/ dd_rescue].
+[=#footnote14 [14]] See [https://www.sleuthkit.org/sleuthkit/].
 == Changelog ==
 || Date || Author || Description ||
+||2017-03-29||chrfranke||Add section Case Studies with a real-life example using ddrescue and sleuthkit on Windows||
 ||2009-08-11||dipohl||Add documentation improvements by Francesco Potorti` (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=540359)||
 ||2009-01-28||ballen4705||Incorporated suggestion from Danie Marais (https://sourceforge.net/p/smartmontools/mailman/message/21437469/)||