| | 740 | == Case Studies == |
| | 741 | |
| | 742 | This section is intended to collect step-by-step descriptions of some real-life use cases. |
| | 743 | |
| | 744 | === Recovering a (mostly) unreadable sector of a Notebook HDD === |
| | 745 | |
| | 746 | This was done in March 2016 under Windows 7 using ''Cygwin''^[#footnote12 12.]^ ports of ''GNU ddrescue''^[#footnote13 13.]^ and ''The Sleuth Kit (TSK)''^[#footnote14 14.]^. All commands shown should work similar on other platforms and with other filesystems. |
| | 747 | |
| | 748 | ==== Determine Logical Block Address of unreadable sector ==== |
| | 749 | |
| | 750 | Examine smartctl output: |
| | 751 | {{{ |
| | 752 | root:~# smartctl -x /dev/sdb |
| | 753 | smartctl 6.5 2016-02-29 r4227 [x86_64-w64-mingw32-win7-sp1] (daily-20160229) |
| | 754 | ... |
| | 755 | Model Family: SAMSUNG SpinPoint MP5 |
| | 756 | Device Model: SAMSUNG HM640JJ |
| | 757 | ... |
| | 758 | Firmware Version: 2AK10001 |
| | 759 | User Capacity: 640.135.028.736 bytes [640 GB] |
| | 760 | Sector Size: 512 bytes logical/physical |
| | 761 | Rotation Rate: 7200 rpm |
| | 762 | Form Factor: 2.5 inches |
| | 763 | ... |
| | 764 | ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE |
| | 765 | ... |
| | 766 | 5 Reallocated_Sector_Ct PO--CK 252 252 010 - 0 |
| | 767 | ... |
| | 768 | 9 Power_On_Hours -O--CK 100 100 000 - 251 <=== See Self-test Log below |
| | 769 | ... |
| | 770 | 197 Current_Pending_Sector -O--CK 100 100 000 - 1 <=== At least 1 bad sector |
| | 771 | ... |
| | 772 | SMART Extended Comprehensive Error Log Version: 1 (2 sectors) |
| | 773 | Device Error Count: 351 (device log contains only the most recent 8 errors) |
| | 774 | ... |
| | 775 | Error 351 [6] occurred at disk power-on lifetime: 251 hours (10 days + 11 hours) |
| | 776 | When the command that caused the error occurred, the device was active or idle. |
| | 777 | |
| | 778 | After command completion occurred, registers were: |
| | 779 | ER -- ST COUNT LBA_48 LH LM LL DV DC |
| | 780 | -- -- -- == -- == == == -- -- -- -- -- |
| | 781 | 40 -- 51 00 01 00 00 33 3f d8 a6 40 00 Error: UNC 1 sectors at LBA = 0x333fd8a6 = 859822246 <=== Its LBA |
| | 782 | |
| | 783 | Commands leading to the command that caused the error were: |
| | 784 | CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name |
| | 785 | -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- |
| | 786 | 25 00 00 00 01 00 00 33 3f d8 a6 40 00 00:00:06.924 READ DMA EXT |
| | 787 | ... |
| | 788 | |
| | 789 | SMART Extended Self-test Log Version: 1 (2 sectors) |
| | 790 | Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error |
| | 791 | # 1 Short offline Completed: read failure 90% 176 859822246 <=== Detected 75 power on hours ago |
| | 792 | }}} |
| | 793 | |
| | 794 | A read scan helps to verify the LBA and checks for other possible bad sectors |
| | 795 | (alternatively replace `/dev/null` by a file path to create a disk image): |
| | 796 | |
| | 797 | {{{ |
| | 798 | root:~# ddrescue --ask --verbose --binary-prefixes --idirect --force /dev/sdb /dev/null disk.map |
| | 799 | GNU ddrescue 1.21-rc2 |
| | 800 | About to copy 610480 MiBytes from /dev/sdb [SAMSUNG HM640JJ::...] to /dev/null [0]. |
| | 801 | Proceed (y/N)? y |
| | 802 | ... |
| | 803 | non-tried: 0 B, errsize: 512 B, run time: 2h |
| | 804 | rescued: 610480 MiB, errors: 1, remaining time: n/a |
| | 805 | percent rescued: 99.99% time since last successful read: 20s |
| | 806 | Finished |
| | 807 | }}} |
| | 808 | |
| | 809 | The `ddrescue` map file now shows byte ranges of good and bad disk areas: |
| | 810 | |
| | 811 | {{{ |
| | 812 | root:~# cat disk.map |
| | 813 | ... |
| | 814 | # pos size status |
| | 815 | 0x00000000 0x667FB14C00 + |
| | 816 | 0x667FB14C00 0x00000200 - <=== 512 bytes unreadable |
| | 817 | 0x667FB14E00 0x2E8B541200 + |
| | 818 | }}} |
| | 819 | |
| | 820 | Translate the byte position to the LBA: |
| | 821 | |
| | 822 | {{{ |
| | 823 | root:~# echo $((0x667FB14C00/512)) |
| | 824 | 859822246 |
| | 825 | }}} |
| | 826 | |
| | 827 | Or convert the map file to a `badblocks` like list with `ddrescuelog` (part of recent versions of ''ddrescue'' package): |
| | 828 | |
| | 829 | {{{ |
| | 830 | root:~# ddrescuelog --list-blocks=- disk.map |
| | 831 | 859822246 |
| | 832 | }}} |
| | 833 | |
| | 834 | Both match the LBA reported by `smartctl`. |
| | 835 | |
| | 836 | ==== Find affected file ==== |
| | 837 | |
| | 838 | Get start offset of affected partition: |
| | 839 | |
| | 840 | {{{ |
| | 841 | root:~# fdisk --list /dev/sdb |
| | 842 | ... |
| | 843 | Device Boot Start End Sectors Size Id Type |
| | 844 | /dev/sdb1 63 1250258624 1250258562 596.2G 7 HPFS/NTFS/exFAT |
| | 845 | }}} |
| | 846 | |
| | 847 | Get filesystem block (cluster) size if unknown (4096 in many cases): |
| | 848 | |
| | 849 | {{{ |
| | 850 | root:~# fsstat /dev/sdb1 |
| | 851 | ... |
| | 852 | File System Type: NTFS |
| | 853 | ... |
| | 854 | Sector Size: 512 |
| | 855 | Cluster Size: 4096 |
| | 856 | ... |
| | 857 | }}} |
| | 858 | |
| | 859 | Calculate number of bad cluster as `(BAD_LBA - START_LBA) / SECTORS_PER_CLUSTER`: |
| | 860 | |
| | 861 | {{{ |
| | 862 | root:~# echo $(((859822246-63)/8)) |
| | 863 | 107477772 |
| | 864 | }}} |
| | 865 | |
| | 866 | Find inode (here: MFT entry) used by this cluster: |
| | 867 | |
| | 868 | {{{ |
| | 869 | root:~# ifind -d 107477772 /dev/sdb1 |
| | 870 | 663-128-2 |
| | 871 | }}} |
| | 872 | |
| | 873 | Print some info about this inode: |
| | 874 | |
| | 875 | {{{ |
| | 876 | root:~# istat /dev/sdb1 663-128-2 |
| | 877 | ... |
| | 878 | Name: Backup_2015-12-17.zip |
| | 879 | Parent MFT Entry: 30 Sequence: 1 |
| | 880 | Allocated Size: 4660039680 Actual Size: 4660039516 |
| | 881 | Created: 2015-12-17 13:43:30.460000000 (CET) |
| | 882 | File Modified: 2015-12-17 13:46:19.647000000 (CET) |
| | 883 | ... |
| | 884 | Type: $DATA (128-2) Name: N/A Non-Resident size: 4660039516 init_size: 4660039516 |
| | 885 | 106950180 106950181 ... |
| | 886 | ... |
| | 887 | 107477772 <=== The bad cluster |
| | 888 | ... |
| | 889 | 108087884 |
| | 890 | }}} |
| | 891 | |
| | 892 | Find full path of affected file: |
| | 893 | |
| | 894 | {{{ |
| | 895 | root:~# ffind /dev/sdb1 663-128-2 |
| | 896 | /Backups/2015/Backup_2015-12-17.zip |
| | 897 | }}} |
| | 898 | |
| | 899 | If the file is no longer needed, it could be overwritten in place and removed then. This is easy with `shred` from ''GNU coreutils'': `shred --iterations=1 --remove /PATH/TO/FILE`. This should reallocate the bad sector in most cases. |
| | 900 | |
| | 901 | ==== Try to recover the bad sector ==== |
| | 902 | |
| | 903 | Start with 100 read retries of the bad sector, write to `recovered.bin` if successful: |
| | 904 | |
| | 905 | {{{ |
| | 906 | root:~# ddrescue --ask --verbose --binary-prefixes --idirect --retry=100 \ |
| | 907 | --input-position=859822246s --output-position=0 --size=1s \ |
| | 908 | /dev/sdb recovered.bin recovered.map |
| | 909 | ... |
| | 910 | Current status |
| | 911 | ipos: 419835 MiB, non-trimmed: 0 B, current rate: 32 B/s |
| | 912 | opos: 0 B, non-scraped: 0 B, average rate: 4 B/s |
| | 913 | non-tried: 0 B, errsize: 0 B, run time: 1m 49s |
| | 914 | rescued: 512 B, errors: 0, remaining time: n/a |
| | 915 | percent rescued: 100.00% time since last successful read: 0s |
| | 916 | Finished |
| | 917 | }}} |
| | 918 | |
| | 919 | We were very lucky: |
| | 920 | |
| | 921 | {{{ |
| | 922 | root:~# cat recovered.map |
| | 923 | ... |
| | 924 | # pos size status |
| | 925 | 0x00000000 0x667FB14C00 ? |
| | 926 | 0x667FB14C00 0x00000200 + <=== Now OK! |
| | 927 | 0x667FB14E00 0x2E8B541200 ? |
| | 928 | }}} |
| | 929 | |
| | 930 | Check whether the disk firmware took the chance to reallocate the sector using the recovered data: |
| | 931 | |
| | 932 | {{{ |
| | 933 | root:~# dd skip=859822246 count=1 iflag=direct if=/dev/sdb of=test.bin |
| | 934 | dd: error reading ‘/dev/sdb’: Input/output error |
| | 935 | 0+0 records in |
| | 936 | 0+0 records out |
| | 937 | 0 bytes (0 B) copied, 23.5006 s, 0.0 kB/s |
| | 938 | }}} |
| | 939 | |
| | 940 | No luck in this case. So overwrite the sector manually: |
| | 941 | |
| | 942 | {{{ |
| | 943 | root:~# dd seek=859822246 count=1 oflag=direct if=recovered.bin of=/dev/sdb |
| | 944 | 1+0 records in |
| | 945 | 1+0 records out |
| | 946 | 512 bytes (512 B) copied, 1.05331 s, 0.5 kB/s |
| | 947 | }}} |
| | 948 | |
| | 949 | Read data back and check: |
| | 950 | |
| | 951 | {{{ |
| | 952 | root:~# dd skip=859822246 count=1 iflag=direct if=/dev/sdb of=test.bin |
| | 953 | 1+0 records in |
| | 954 | 1+0 records out |
| | 955 | 512 bytes (512 B) copied, 0.0211745 s, 24.2 kB/s |
| | 956 | |
| | 957 | root:~# diff -s recovered.bin test.bin |
| | 958 | Files recovered.bin and test.bin are identical |
| | 959 | }}} |
| | 960 | |
| | 961 | Finally, run a SMART self-test and check its result: |
| | 962 | |
| | 963 | {{{ |
| | 964 | root:~# smartctl -t short /dev/sdb |
| | 965 | ... |
| | 966 | Sending command: "Execute SMART Short self-test routine immediately in off-line mode". |
| | 967 | ... |
| | 968 | Please wait 2 minutes for test to complete. |
| | 969 | |
| | 970 | root:~# sleep 120 # :-) |
| | 971 | |
| | 972 | root:~# smartctl -x /dev/sdb |
| | 973 | ... |
| | 974 | ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE |
| | 975 | ... |
| | 976 | 5 Reallocated_Sector_Ct PO--CK 252 252 010 - 0 <=== Interesting... |
| | 977 | ... |
| | 978 | 9 Power_On_Hours -O--CK 100 100 000 - 252 |
| | 979 | ... |
| | 980 | 197 Current_Pending_Sector -O--CK 100 100 000 - 0 <=== As expected |
| | 981 | ... |
| | 982 | SMART Extended Self-test Log Version: 1 (2 sectors) |
| | 983 | Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error |
| | 984 | # 1 Short offline Completed without error 00% 252 - <=== Works again! |
| | 985 | # 2 Short offline Completed: read failure 90% 176 859822246 |
| | 986 | }}} |
| | 987 | |
| | 988 | Interestingly the `Reallocated_Sector_Ct` did not increase. Either the firmware did not record the reallocation or decided to reuse the original sector. |
| | 989 | |
| | 990 | Done! |
| | 991 | |