Custom Query (1469 matches)
Results (151 - 153 of 1469)
Ticket | Resolution | Summary | Owner | Reporter |
---|---|---|---|---|
#1715 | fixed | Allow to ignore certain bits of NVMe Critical Warning byte | ||
Description |
I have a Samsung SSD 960 EVO 250GB with 424TB written. The drive stores large amounts of RRD files for an SNMP-type monitoring system and gets re-written constantly. The manufacturer warranty for this drive is 100TB, so we are 424% beyond the warranty and thus the "Percentage Used" value. However, the drive works fine and shows no other signs of wearing out. I recently did an OS update on this Debian host, from Debian 10 to 11, and along with it came a new version of smartmontools. Unfortunately it now complains every 24 hours with the following error: The following warning/error was logged by the smartd daemon: Device: /dev/nvme0, Critical Warning (0x04): Reliability I have been forced to add a "/dev/nvme0 -d ignore" to my /etc/smartd.conf file, but this prevents me from being alerted to any other possible problems, including any reduction of the "Available Spare" value, or thermal warnings. With ATA drives, it's possible it ignore certain specific attributes with the -i or -I arguments, but I'm not aware of any similar feature which might be helpful here. Would it be possible to ignore such warnings, while still monitoring the device for other problems? What do you advise? I suspect this problem will occur more regularly as time goes on and more too-reliable-for-their-own-damned-good drives will begin to annoy their administrators. The only crime this drive has committed is it's failure to fail! Please end this unfair persecution of my poor, abused but reliable, NVME drive! This request on Serverfault is similar to mine and might be worth a read (I am not the author):
My searching found that this request is somewhat similar to bug 1434: Below is full output from a smartctl -a. > sudo smartctl -a /dev/disk/by-id/nvme-Samsung_SSD_960_EVO_250GB_xxxxxxxxxxxxxxx smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-21-amd64] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: Samsung SSD 960 EVO 250GB Serial Number: xxxxxxxxxxxxxxx Firmware Version: 2B7QCXE7 PCI Vendor/Subsystem ID: 0x144d IEEE OUI Identifier: 0x002538 Total NVM Capacity: 250,059,350,016 [250 GB] Unallocated NVM Capacity: 0 Controller ID: 2 NVMe Version: 1.2 Number of Namespaces: 1 Namespace 1 Size/Capacity: 250,059,350,016 [250 GB] Namespace 1 Utilization: 191,818,444,800 [191 GB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: xxxxxxxxxxxxxxxxx Local Time is: Thu Apr 6 20:23:49 2023 MST Firmware Updates (0x16): 3 Slots, no Reset required Optional Admin Commands (0x0007): Security Format Frmw_DL Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 77 Celsius Critical Comp. Temp. Threshold: 79 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 6.04W - - 0 0 0 0 0 0 1 + 5.09W - - 1 1 1 1 0 0 2 + 4.08W - - 2 2 2 2 0 0 3 - 0.0400W - - 3 3 3 3 210 1500 4 - 0.0050W - - 4 4 4 4 2200 6000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! - NVM subsystem reliability has been degraded SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x04 Temperature: 28 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 255% Data Units Read: 151,762,677 [77.7 TB] Data Units Written: 830,060,020 [424 TB] Host Read Commands: 6,877,731,354 Host Write Commands: 51,000,719,462 Controller Busy Time: 79,390 Power Cycles: 35 Power On Hours: 31,418 Unsafe Shutdowns: 21 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 28 Celsius Temperature Sensor 2: 35 Celsius Error Information (NVMe Log 0x01, 16 of 64 entries) No Errors Logged |
|||
#1713 | fixed | New database entry: ADATA SU630 | ||
Description |
This is a 1TB USB SSD from ADATA. It has an Unknown USB bridge [0x125f:0xa88a (0x9301)] that uses -d sat I ran the short test: smartctl pre-7.4 2023-03-21 r5470 [aarch64-linux-6.1.10-v8+] (local build) Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: ADATA SU630 Serial Number: 2K2020057758 LU WWN Device Id: 0 000000 000000000 Firmware Version: S1127B0 User Capacity: 960,197,124,096 bytes [960 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches TRIM Command: Available, deterministic, zeroed Device is: Not in smartctl database 7.3/5440 ATA Version is: ACS-2 T13/2015-D revision 3 SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Thu Apr 6 05:12:57 2023 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM level is: 128 (minimum power consumption without standby) Rd look-ahead is: Enabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, NOT FROZEN [SEC1] Wt Cache Reorder: Unavailable === START OF READ SMART DATA SECTION === SMART Status not supported: Incomplete response, ATA output registers missing SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. General SMART Values: Offline data collection status: (0x02) Offline data collection activity was completed without error. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 120) seconds. Offline data collection capabilities: (0x11) SMART execute Offline immediate. No Auto Offline data collection support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0002) Does not save SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 10) minutes. SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate -O--CK 100 100 050 - 0 5 Reallocated_Sector_Ct -O--CK 100 100 050 - 0 9 Power_On_Hours -O--CK 100 100 050 - 19328 12 Power_Cycle_Count -O--CK 100 100 050 - 71 160 Unknown_Attribute -O--CK 100 100 050 - 0 161 Unknown_Attribute PO--CK 100 100 050 - 100 163 Unknown_Attribute -O--CK 100 100 050 - 187 164 Unknown_Attribute -O--CK 100 100 050 - 134919 165 Unknown_Attribute -O--CK 100 100 050 - 318 166 Unknown_Attribute -O--CK 100 100 050 - 16 167 Unknown_Attribute -O--CK 100 100 050 - 76 168 Unknown_Attribute -O--CK 100 100 050 - 5050 169 Unknown_Attribute -O--CK 100 100 050 - 99 175 Program_Fail_Count_Chip -O--CK 100 100 050 - 0 176 Erase_Fail_Count_Chip -O--CK 100 100 050 - 0 177 Wear_Leveling_Count -O--CK 100 100 050 - 0 178 Used_Rsvd_Blk_Cnt_Chip -O--CK 100 100 050 - 0 181 Program_Fail_Cnt_Total -O--CK 100 100 050 - 0 182 Erase_Fail_Count_Total -O--CK 100 100 050 - 0 192 Power-Off_Retract_Count -O--CK 100 100 050 - 66 194 Temperature_Celsius -O---K 100 100 050 - 41 195 Hardware_ECC_Recovered -O--CK 100 100 050 - 0 196 Reallocated_Event_Count -O--CK 100 100 050 - 0 197 Current_Pending_Sector -O--CK 100 100 050 - 0 198 Offline_Uncorrectable -O--CK 100 100 050 - 0 199 UDMA_CRC_Error_Count -O--CK 100 100 050 - 1 232 Available_Reservd_Space -O--CK 100 100 050 - 100 241 Total_LBAs_Written ----CK 100 100 050 - 310222 242 Total_LBAs_Read ----CK 100 100 050 - 58142 245 Unknown_Attribute -O--CK 100 100 050 - 1021460 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 1 Comprehensive SMART error log 0x03 GPL R/O 1 Ext. Comprehensive SMART error log 0x04 GPL,SL R/O 8 Device Statistics log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xde GPL VS 8 Device vendor specific log SMART Extended Comprehensive Error Log Version: 1 (1 sectors) Device Error Count: 42 (device log contains only the most recent 4 errors) CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 42 [1] log entry is empty Error 41 [0] log entry is empty Error 40 [3] log entry is empty Error 39 [2] occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 00 -- 00 00 00 00 00 00 00 00 00 00 00 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- b0 00 d1 01 01 00 00 4f 00 c2 01 00 00 00:00:00.000 SMART READ ATTRIBUTE THRESHOLDS [OBS-4] 2f 00 00 01 01 00 00 00 00 00 03 00 00 00:00:00.000 READ LOG EXT 2f 00 00 01 01 00 00 00 00 00 00 00 00 00:00:00.000 READ LOG EXT b0 00 d5 01 01 00 00 4f 00 c2 00 00 00 00:00:00.000 SMART READ LOG b0 00 da 00 00 00 00 4f 00 c2 00 00 00 00:00:00.000 SMART RETURN STATUS SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 19328 - Selective Self-tests/Logging not supported SCT Commands not supported Device Statistics (GP Log 0x04) Page Offset Size Value Flags Description 0x01 ===== = = === == General Statistics (rev 1) == 0x01 0x008 4 71 --- Lifetime Power-On Resets 0x01 0x010 4 19328 --- Power-on Hours 0x01 0x018 6 3150847013 --- Logical Sectors Written 0x01 0x020 6 295279703 --- Number of Write Commands 0x01 0x028 6 3810455378 --- Logical Sectors Read 0x01 0x030 6 39980173 --- Number of Read Commands 0x07 ===== = = === == Solid State Device Statistics (rev 1) == 0x07 0x008 1 1 --- Percentage Used Endurance Indicator |||_ C monitored condition met ||__ D supports DSN |___ N normalized value Pending Defects log (GP Log 0x0c) not supported SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 4 0 Command failed due to ICRC error 0x0002 4 0 R_ERR response for data FIS 0x0005 4 0 R_ERR response for non-data FIS 0x000a 4 19 Device-to-host register FISes sent due to a COMRESET FARM log (GP Log 0xA6) not supported for non-Seagate drives |
|||
#1711 | duplicate | Add drive to DB - WD20EFZX-68AWUN0 | ||
Description |
Hi, I have the following drive and it shows that it is not in the database. I've the latest version of database. Model: WD Red Plus - WD20EFZX-68AWUN0 I appreciate if you can add it. |